On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Link: https://arxiv.org/abs/2204.12708

PDF: https://aclanthology.org/2022.findings-naacl.168.pdf

Findings of the Association for Computational Linguistics: NAACL 2022, pages 2182 – 2194
July 10-15, 2022



Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out “easy” instances (Sakaguchi et al., 2020), culminating in a recent proposal to eliminate single-word correlations altogether (Gardner et al., 2021). In this opinion paper, we identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to “throw the baby out with the bathwater” and miss important signal encoding common sense and world knowledge. We highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.

Author(s): Roy Schwartz, Gabriel Stanovsky

Publication Date: July 2022

Publication Site: arXiV

Modern Portfolio Theory Faces Issues As Correlations Turn Positive

Link: https://www.thewealthadvisor.com/article/modern-portfolio-theory-faces-issues-correlations-turn-positive?mkt_tok=NDQ2LVVIUy0wMTMAAAF_mViiBJO-qrg7D4DudMyxmY2hssLidn3lEOlX-kAIh3R_yylYhWdr5_fo6QtLbdN1_nODniHhefsm5_gZSApaxmU5Rf8Kz5XOyKg-v1SmPwQe



However, Modern Portfolio Theory may have a problem going forward. Don’t worry, we are not going to hack on bonds based on a fear that yields may rise in the future, creating a portfolio drag. There are already enough bond haters out there. The issue we are seeing goes beyond just the bond argument – correlations have been rising just about everywhere. In today’s world, correlations have been changing, with more and more asset classes becoming increasingly correlated. The problem: when the correlations between investments are higher, it becomes harder to diversify risk in a portfolio.

Let’s start with the big one, global bonds and global equities. Combining equities and bonds has benefitted from a generally negative correlation for much of the past few decades. However, this correlation has turned positive of late (chart 1), implying reduced diversification benefits when combining bonds and equities. This isn’t too much of a concern, given that the long-term average is slightly positive.

But don’t throw out your bonds just yet. This correlation tends to return to be strongly negative during risk-off periods in the equity markets. This reflex action during corrections helps maintain bonds in portfolios, even if they experience periods of low or even negative performance.

Publication Date: 15 Sept 2021

Publication Site: The Wealth Advisor

Music Sentiment and Stock Returns Around the World

Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3776071



This paper introduces a real-time, continuous measure of national sentiment that is language-free and thus comparable globally: the positivity of songs that individuals choose to listen to. This is a direct measure of mood that does not pre-specify certain mood-affecting events nor assume the extent of their impact on investors. We validate our music-based sentiment measure by correlating it with mood swings induced by seasonal factors, weather conditions, and COVID-related restrictions. We find that music sentiment is positively correlated with same-week equity market returns and negatively correlated with next-week returns, consistent with sentiment-induced temporary mispricing. Results also hold under a daily analysis and are stronger when trading restrictions limit arbitrage. Music sentiment also predicts increases in net mutual fund flows, and absolute sentiment precedes a rise in stock market volatility. It is negatively associated with government bond returns, consistent with a flight to safety.


Alex Edmans
London Business School – Institute of Finance and Accounting; European Corporate Governance Institute (ECGI); Centre for Economic Policy Research (CEPR)

Adrian Fernandez-Perez
Auckland University of Technology

Alexandre Garel
Audencia Business School

Ivan Indriawan
Auckland University of Technology – Department of Finance

Publication Date: 14 Aug 2021

Publication Site: SSRN, Journal of Financial Economics (forthcoming)

An Alternative to the Correlation Coefficient That Works For Numeric and Categorical Variables

Link: https://rviews.rstudio.com/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/



Using an insight from Information Theory, we devised a new metric – the x2y metric – that quantifies the strength of the association between pairs of variables.

The x2y metric has several advantages:

It works for all types of variable pairs (continuous-continuous, continuous-categorical, categorical-continuous and categorical-categorical)

It captures linear and non-linear relationships

Perhaps best of all, it is easy to understand and use.

I hope you give it a try in your work.

Author(s): Rama Ramakrishnan

Publication Date: 15 April 2021

Publication Site: R Views

Ahead of the curve: Modelling the unmodellable

Link: https://www.ipe.com/home/ahead-of-the-curve-modelling-the-unmodellable/10051869.article



In his youth, the economist Kenneth Arrow analysed weather forecasts for the US Army. When he found that the predictions were as reliable as historical averages, he suggested reallocating manpower. The response from the army general’s office? “The general is well aware that your division’s forecasts are worthless. However, they are required for planning purposes.”

Even before COVID-19, many shared that scepticism of forecasts. The failure to foresee the 2008-09 financial crisis started a debate on economic modelling. Over the past year, the performance of epidemiological models has not resolved this quandary.

Investors have long known that “all models are wrong, but some are useful,” to use the statistician George Box’s pithy idiom. But, there are modellers who use this defence to preserve models beyond usefulness. Meanwhile, there are unrealistic expectations from consumers of models including investors, policymakers and society. They assume that complex issues are easy to forecast, when some things are just unknowable. This gap begs the question of what investors should do.

Author(s): Sahil Mahtani

Publication Date: April 2021

Publication Site: Investments & Pensions Europe

Associations Between Governor Political Affiliation and COVID-19 Cases, Deaths, and Testing in the U.S.

Link: https://www.ajpmonline.org/article/S0749-3797(21)00135-5/fulltext

DOI: https://doi.org/10.1016/j.amepre.2021.01.034



Results: From March to early June, Republican-led states had lower COVID-19 incidence rates compared with Democratic-led states. On June 3, the association reversed, and Republican-led states had higher incidence (RR=1.10, 95% PI=1.01, 1.18). This trend persisted through early December. For death rates, Republican-led states had lower rates early in the pandemic, but higher rates from July 4 (RR=1.18, 95% PI=1.02, 1.31) through mid-December. Republican-led states had higher test positivity rates starting on May 30 (RR=1.70, 95% PI=1.66, 1.73) and lower testing rates by September 30 (RR=0.95, 95% PI=0.90, 0.98).

Author(s): Brian Neelon, PhD; Fedelis Mutiso, MS; Noel T. Mueller, PhD, MPH; John L. Pearce, PhD; Sara E. Benjamin-Neelon, PhD, JD, MPH

Publication Date: 9 March 2021

Publication Site: American Journal of Preventive Medicine