Text is maybe the most underrated element in any data visualization. There’s a lot of text in any chart or map — titles, descriptions, notes, sources, bylines, logos, annotations, labels, color keys, tooltips, axis labels — but often, it’s an afterthought in the design process. This article explains how to use text to make your visualizations easier to read and nicer to look at.
To quote Andy Kirk, “we can look at data, but we cannot really see it. To see data, we need to represent it in a different, visual form.” So, in an attempt to make data more accessible, you may create more visual representations – dots, lines, shapes, and colours. These building blocks combine to create all sorts of charts and pictures helping readers understand numbers.
Although the purpose of visualising data is clear (and universal), the reasons can be different. The reason you visualise data, will help you determine the appropriate visual.
In my case, the graphs I made looked just fine—it’s just that I didn’t understand how copy/pasting graphs between Excel and Word worked (at the time). This was in the mid-2000s, when memory wasn’t quite so plentiful, so many corporate email accounts had memory quotas. If you hit that quota, you would be locked out of your email account. You had to call IT and actually talk to a person!
I was a lowly entry-level person at a financial services company and had done some Monte Carlo modeling involving 1,000,000 scenarios. We were developing a new mutual fund project, based on changing allocations over time as people moved towards retirement, and the company wanted me to model outcomes for different allocation trajectories. After a “full” model run of one million scenarios, I made diagnostic graphs showing the distribution of key metrics (such as the annual accumulation of the fund, how many times the fund decreased while the owner was in retirement, and whether – and when – the money in the fund ran out) so that we could analyze different potential fund strategies. The graphs themselves were fairly simple.
I’m often looking at distributions, and wanting to communicate something about how those distributions change over time, or how distributions compare. Often, I have to simply pick out key percentiles in those distributions, or key aspects, such as mean and standard deviation.
But why not graph all the points in one’s sample directly, if one has them?
The numbers of expected deaths are estimated using statistical models and based on previous 5 years’ (2015 to 2019) mortality rates. Weekly monitoring of excess mortality from all causes throughout the COVID-19 pandemic provides an objective and comparable measure of the scale of the pandemic [reference 1]. Measuring excess mortality from all causes, instead of focusing solely on mortality from COVID-19, overcomes the issues of variation in testing and differential coding of cause of death between individuals and over time [reference 1].
In the weekly reports, estimates of excess deaths are presented by week of registration at national and subnational level, for subgroups of the population (age groups, sex, deprivation groups, ethnic groups) and by cause of death and place of death.
Author(s): Office for Health Improvement and Disparities
For anti-racist dataviz, our most effective tool is context. The way that data is framed can make a very real impact on how it’s interpreted. For example, this case study from the New York Times shows two different framings of the same economic data and how, depending on where the author starts the X-Axis, it can tell 2 very different — but both accurate — stories about the subject.
As Pieta previously highlighted, dataviz in spaces that address race / ethnicity are sensitive to “deficit framing.” That is, when it’s presented in a way that over-emphasizes differences between groups (while hiding the diversity of outcomes within groups), it promotes deficit thinking (see below) and can reinforce stereotypes about the (often minoritized) groups in focus.
In a follow up study, Eli and Cindy Xiong (of UMass’ HCI-VIS Lab) confirmed Pieta’s arguments, showing that even “neutral” data visualizations of outcome disparities can lead to deficit thinking (and therefore stereotyping) and that the way visualizations are designed can significantly impact these harmful tendencies.
Ignoring or deemphasizing uncertainty in dataviz can create false impressions of group homogeneity (low outcome variance). If stereotypes stem from false impressions of group homogeneity, then the way visualizations represent uncertainty (or choose to ignore it) could exacerbate these false impressions of homogeneity and mislead viewers toward stereotyping.
If this is the case, then social-outcome-disparity visualizations that hide within-group variability (e.g. a bar chart without error bars) would elicit more harmful stereotyping than visualizations that emphasize within-group variance (e.g. a jitter plot).
According to forecasting by Reason Foundation’s Pension Integrity Project, when the fiscal year 2022 pension financial reports roll in, the unfunded liabilities of the 118 state public pension plans are expected to again exceed $1 trillion in 2022. After a record-breaking year of investment returns in 2021, which helped reduce a lot of longstanding pension debt, the experience of public pension assets has swung drastically in the other direction over the last 12 months. Early indicators point to investment returns averaging around -6% for the 2022 fiscal year, which ended on June 30, 2022, for many public pension systems.
Based on a -6% return for fiscal 2022, the aggregate unfunded liability of state-run public pension plans will be $1.3 trillion, up from $783 billion in 2021, the Pension Integrity Project finds. With a -6% return in 2022, the aggregate funded ratio for these state pension plans would fall from 85% funded in 2021 to 75% funded in 2022.
Author(s): Truong Bui, Jordan Campbell, Zachary Christensen
Monetary inflation occurs when the U.S. money supply increases over time. This represents both physical and digital money circulating in the economy including cash, checking accounts, and money market mutual funds.
The U.S. central bank typically influences the money supply by printing money, buying bonds, or changing bank reserve requirements. The central bank controls the money supply in order to boost the economy or tame inflation and keep prices stable.
Between 2020-2021, the money supply increased roughly 25%—a historic record—in response to the COVID-19 crisis. Since then, the Federal Reserve began tapering its bond purchases as the economy showed signs of strength.
We report an estimate of the Earth’s average land surface temperature for the period 1753 to 2011. To address issues of potential station selection bias, we used a larger sampling of stations than had prior studies. For the period post 1880, our estimate is similar to those previously reported by other groups, although we report smaller uncertainties. The land temperature rise from the 1950s decade to the 2000s decade is 0.90 ± 0.05°C (95% confidence). Both maximum and minimum temperatures have increased during the last century. Diurnal variations decreased from 1900 to 1987 and then increased; this increase is significant but not understood. The period of 1753 to 1850 is marked by sudden drops in land surface temperature that are coincident with known volcanism; the response function is approximately 1.5 ± 0.5°C per 100 Tg of atmospheric sulfate. This volcanism, combined with a simple proxy for anthropogenic effects (logarithm of the CO2 addition of a solar forcing term. Thus, for this very simple model, solar forcing does not appear to contribute to the observed global warming of the past 250 years; the entire change can be modeled by a sum of volcanism and a single anthropogenic proxy. The residual variations include interannual and multi-decadal variability very similar to that of the Atlantic Multidecadal Oscillation (AMO).
A global land–ocean temperature record has been created by combining the Berkeley Earth monthly land temperature field with spatially kriged version of the HadSST3 dataset. This combined product spans the period from 1850 to present and covers the majority of the Earth’s surface: approximately 57 % in 1850, 75 % in 1880, 95 % in 1960, and 99.9 % by 2015. It includes average temperatures in 1∘×1∘ lat–long grid cells for each month when available. It provides a global mean temperature record quite similar to records from Hadley’s HadCRUT4, NASA’s GISTEMP, NOAA’s GlobalTemp, and Cowtan and Way and provides a spatially complete and homogeneous temperature field. Two versions of the record are provided, treating areas with sea ice cover as either air temperature over sea ice or sea surface temperature under sea ice, the former being preferred for most applications. The choice of how to assess the temperature of areas with sea ice coverage has a notable impact on global anomalies over past decades due to rapid warming of air temperatures in the Arctic. Accounting for rapid warming of Arctic air suggests ∼ 0.1 ∘C additional global-average temperature rise since the 19th century than temperature series that do not capture the changes in the Arctic. Updated versions of this dataset will be presented each month at the Berkeley Earth website (http://berkeleyearth.org/data/, last access: November 2020), and a convenience copy of the version discussed in this paper has been archived and is freely available at https://doi.org/10.5281/zenodo.3634713 (Rohde and Hausfather, 2020).
Author(s): Robert A. Rohde1 and Zeke Hausfather1,2
Citation: Rohde, R. A. and Hausfather, Z.: The Berkeley Earth Land/Ocean Temperature Record, Earth Syst. Sci. Data, 12, 3469–3479, https://doi.org/10.5194/essd-12-3469-2020, 2020.