Child Mortality Rate, under age five – doc v11

Link: https://www.gapminder.org/data/documentation/gd005/

Graphic:

Excerpt:

Documentation — version 11

This page describes how Gapminder has combined data from multiple sources into one long coherent dataset with Child mortality under age 5, for all countries for all years between 1800 to 2100.

Data » Online spreadsheet with data for countries, regions and global total — v11

SUMMARY DOCUMENTATION OF V11

Sources

— 1800 to 1950: Gapminder v7  (In some cases this is also used for years after 1950, see below.) This was compiled and documented by Klara Johansson and Mattias Lindgren from many sources but mainly based on www.mortality.org and the series of books called International Historical Statistics by Brian R Mitchell, which often have historic estimates of Infant mortality rate which were converted to Child mortality through regression. See detailed documentation of v7 below.

— 1950 to 2016: UNIGME, is a data collaboration project between UNICEF, WHO, UN Population Division and the World Bank. They released new estimates of child mortality for countries and a global estimate on September 19, 2019, and the data is available at www.childmortality.org. In this dataset, 70% of all countries have estimates between 1970 and 2018, while roughly half the countries also reach back to 1960 and 17% reach back to 1950.

— 1950 to 2100: UN POPWorld Population Prospects 2019 provides annual data for Child mortality rate for all countries in the annually interpolated demographic indicators, called WPP2019_INT_F01_ANNUAL_DEMOGRAPHIC_INDICATORS.xlsx, accessed on January 12, 2020.

Publication Date: accessed 22 March 2023

Publication Site: Gapminder

Insurtech Regs, ‘Dark Pattern’ Spottting on NAIC’s To-Do List

Link: https://www.thinkadvisor.com/2022/12/16/insurtech-regs-dark-pattern-spottting-on-naics-to-do-list/

Excerpt:

In August [2022], Birny Birnbaum, the executive director of the Center for Economic Justice, asked the [NAIC] Market Regulation committee to train analysts to detect “dark patterns” and to define dark patterns as an unfair and deceptive trade practice.

The term “dark patterns” refers to techniques an online service can use to get consumers to do things they would otherwise not do, according to draft August meeting notes included in the committee’s fall national meeting packet.

Dark pattern techniques include nagging; efforts to keep users from understanding and comparing prices; obscuring important information; and the “roach motel” strategy, which makes signing up for an online service much easier than canceling it.

Author(s): Allison Bell

Publication Date: 16 Dec 2022

Publication Site: Think Advisor

Bring ChatGPT INSIDE Excel to Solve ANY Problem Lightning FAST

Link: https://www.youtube.com/watch?v=kQPUWryXwag&ab_channel=LeilaGharani

Video:

Description:

OpenAI inside Excel? How can you use an API key to connect to an AI model from Excel? This video shows you how. You can download the files from the GitHub link above. Wouldn’t it be great to have a search box in Excel you can use to ask any question? Like to create dummy data, create a formula or ask about the cast of the The Sopranos. And then artificial intelligence provides the information directly in Excel – without any copy and pasting! In this video you’ll learn how to setup an API connection from Microsoft Excel to Open AI’s ChatGPT (GPT-3) by using Office Scripts. As a bonus I’ll show you how you can parse the result if the answer from GPT-3 is in more than 1 line. This makes it easier to use the information in Excel.

Author(s): Leila Gharani

Publication Date: 6 Feb 2023

Publication Site: Youtube

On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Link: https://arxiv.org/abs/2204.12708

PDF: https://aclanthology.org/2022.findings-naacl.168.pdf

Findings of the Association for Computational Linguistics: NAACL 2022, pages 2182 – 2194
July 10-15, 2022

Graphic:

Abstract:

Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out “easy” instances (Sakaguchi et al., 2020), culminating in a recent proposal to eliminate single-word correlations altogether (Gardner et al., 2021). In this opinion paper, we identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to “throw the baby out with the bathwater” and miss important signal encoding common sense and world knowledge. We highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.

Author(s): Roy Schwartz, Gabriel Stanovsky

Publication Date: July 2022

Publication Site: arXiV

Data Challenges in Building a Facial Recognition Model and How to Mitigate Them

Link: https://www.soa.org/resources/research-reports/2023/data-facial-rec/

PDF: https://www.soa.org/49022b/globalassets/assets/files/resources/research-report/2023/dei107-facial-recognition-challenges.pdf

Graphic:

Excerpt:

This paper is an introduction to AI technology designed for actuaries to understand how the technology works, the potential risks it could introduce, and how to mitigate risks. The author focuses on data bias as it is one of the main concerns of facial recognition technology. This research project was jointly sponsored by the Diversity Equity and Inclusion Research and the Actuarial Innovation and Technology Strategic Research Programs

Author(s): Victoria Zhang, FSA, FCIA

Publication Date: Jan 2023

Publication Site: SOA Research Institute

More and Better Uses Ahead for Governments’ Financial Data

Link: https://www.governing.com/finance/more-and-better-uses-ahead-for-governments-financial-data

Excerpt:

In its lame duck session last month, Congress tucked a sleeper section into its 4,000-page omnibus spending bill. The controversial Financial Data Transparency Act (FDTA) swiftly came out of nowhere to become federal law over the vocal but powerless objections of the state and local government finance community. Its impact on thousands of cities, counties and school districts will be a buzzy topic at conferences all this year and beyond. Meanwhile, software companies will be staking claims in a digital land rush.

The central idea behind the FDTA is that public-sector organizations’ financial data should be readily available for online search and standardized downloading, using common file formats. Think of it as “an http protocol for financial data” that enables an investor, analyst, taxpayer watchdog, constituent or journalist to quickly retrieve key financial information and compare it with other numbers using common data fields. Presently, online users of state and local government financial data must rely primarily on text documents, often in PDF format, that don’t lend themselves to convenient data analysis and comparisons. Financial statements are typically published long after the fiscal year’s end, and the widespread online availability of current and timely data is still a faraway concept.

…..

So far, so good. But the devil is in the details. The first question is just what kind of information will be required in this new system, and when. Most would agree that a complete download of every byte of data now formatted in voluminous governmental financial reports and their notes is overwhelming, unnecessary and burdensome. Thus, a far more incremental and focused approach is a wiser path. For starters, it may be helpful to keep the initial data requirements skeletal and focus initially on a dozen or more vital fiscal data points that are most important to financial statement users. Then, after that foundation is laid, the public finance industry can build out. Of course, this will require that regulators buy into a sensible implementation plan.

The debate over information content requirements should focus first on “decision-useful information.” Having served briefly two decades ago as a voting member of the Governmental Accounting Standards Board (GASB), contributing my professional background as a chartered financial analyst, I can attest that almost every one of their meetings included a board member reminding others that required financial statement information should be decision-useful. A key question, of course, is “useful to whom?”

Author(s): Girard Miller

Publication Date: 17 Jan 2023

Publication Site: Governing

Government Financial Reporting – Data Standards and the Financial Data Transparency Act

Link: https://xbrl.us/events/230124/

Date and Time of upcoming event: 3:00 PM ET Tuesday, January 24, 2023 (60 Minutes)

Description:

The U.S. Congress passed legislation on December 15, 2022 that includes requirements for the Securities and Exchange Commission to adopt data standards related to municipal securities. The Financial Data Transparency Act (FDTA) aims to improve transparency in government reporting, while minimizing disruptive changes and requiring no new disclosures. The University of Michigan’s Center for Local State and Urban Policy (CLOSUP) has partnered with XBRL US to develop open, nonproprietary financial data standards that represent government financial reporting which could be freely leveraged to support the FDTA. The Annual Comprehensive Financial Reporting (ACFR) Taxonomy today represents general purpose governments, as well as some special districts, and can be expanded upon to address all types of governments that issue debt securities. CLOSUP has also conducted pilots with local entities including the City of Flint.

Attend this 60-minute session to explore government data standards, find out how governments can create their own machine-readable financial statements, and discover what impact this legislation could have on government entities. Most importantly, discover how machine-readable data standards can benefit state and local government entities by reducing costs and increasing access to time-sensitive information for policy making.

Presenters:

  • Marc Joffe, Public Policy Analyst, Public Sector Credit
  • Stephanie Leiser, Fiscal Health Project Lead, Center for Local, State and Urban Policy (CLOSUP), University of Michigan’s Ford School of Public Policy
  • Campbell Pryde, President and CEO, XBRL US
  • Robert Widigan, Chief Financial Officer, City of Flint

Publication Site: XBRL.us

The most common restaurant cuisine in every state, and a chain-restaurant mystery

Link: https://www.washingtonpost.com/business/2022/09/29/chain-restaurant-capitals/

Graphic:

Excerpt:

The places that drive the most tend to have the same high share of chain restaurants regardless of whether they voted for Trump or Biden. As car commuting decreases, chain restaurants decrease at roughly the same rate, no matter which candidate most residents supported.

If the link between cars and chains transcends partisanship, why does it look like Trump counties have more chain restaurants? It’s at least in part because he won more of the places with the most car commuters!

About 83 percent of workers commute by car nationally, but only 80 percent of folks in Biden counties do so, compared with 90 percent of workers in Trump counties. The share of car commuters ranges from 55 percent in the deep-blue New York City metro area to 96 percent around bright red Decatur, Ala.

Author(s): Andrew Van Dam

Publication Date: 1 Oct 2022

Publication Site: WaPo

The amazing power of “machine eyes”

Link: https://erictopol.substack.com/p/the-amazing-power-of-machine-eyes

Graphic:

Excerpt:

Today’s report on AI of retinal vessel images to help predict the risk of heart attack and stroke, from over 65,000 UK Biobank participants, reinforces a growing body of evidence that deep neural networks can be trained to “interpret” medical images far beyond what was anticipated. Add that finding to last week’s multinational study of deep learning of retinal photos to detect Alzheimer’s disease with good accuracy. In this post I am going to briefly review what has already been gleaned from 2 classic medical images—the retina and the electrocardiogram (ECG)—as representative for the exciting capability of machine vision to “see” well beyond human limits. Obviously, machines aren’t really seeing or interpreting and don’t have eyes in the human sense, but they sure can be trained from hundreds of thousand (or millions) of images to come up with outputs that are extraordinary. I hope when you’ve read this you’ll agree this is a particularly striking advance, which has not yet been actualized in medical practice, but has enormous potential.

Author(s): Eric Topol

Publication Date: 4 Oct 2022

Publication Site: Eric Topol’s substack, Ground Truths

Using First Name Information to Improve Race and Ethnicity Classification

Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2763826

Graphic:

Abstract:

This paper uses a recent first name list to improve on a previous Bayesian classifier, the Bayesian Improved Surname Geocoding (BISG) method, which combines surname and geography information to impute missing race and ethnicity. The proposed approach is validated using a large mortgage lending dataset for whom race and ethnicity are reported. The new approach results in improvements in accuracy and in coverage over BISG for all major ethno-racial categories. The largest improvements occur for non-Hispanic Blacks, a group for which the BISG performance is weakest. Additionally, when estimating disparities in mortgage pricing and underwriting among ethno-racial groups with regression models, the disparity estimates based on either BIFSG or BISG proxies are remarkably close to those based on actual race and ethnicity. Following evaluation, I demonstrate the application of BIFSG to the imputation of missing race and ethnicity in the Home Mortgage Disclosure Act (HMDA) data, and in the process, offer novel evidence that race and ethnicity are somewhat correlated with the incidence of missing race/ethnicity information.

Author(s):

Ioan Voicu
Office of the Comptroller of the Currency (OCC)

Publication Date: February 22, 2016

Publication Site: SSRN

Suggested Citation:

Voicu, Ioan, Using First Name Information to Improve Race and Ethnicity Classification (February 22, 2016). Available at SSRN: https://ssrn.com/abstract=2763826 or http://dx.doi.org/10.2139/ssrn.2763826

Embedded Bias: How Medical Records Sow Discrimination

Link: https://khn.org/news/article/electronic-medical-records-doctor-bias-open-notes-treatment-discrimination/

Excerpt:

Narrow or prejudiced thinking is simple to write down and easy to copy and paste over and over. Descriptions such as “difficult” and “disruptive” can become hard to escape. Once so labeled, patients can experience “downstream effects,” said Dr. Hardeep Singh, an expert in misdiagnosis who works at the Michael E. DeBakey Veterans Affairs Medical Center in Houston. He estimates misdiagnosis affects 12 million patients a year.

Conveying bias can be as simple as a pair of quotation marks. One team of researchers found that Black patients, in particular, were quoted in their records more frequently than other patients when physicians were characterizing their symptoms or health issues. The quotation mark patterns detected by researchers could be a sign of disrespect, used to communicate irony or sarcasm to future clinical readers. Among the types of phrases the researchers spotlighted were colloquial language or statements made in Black or ethnic slang.

“Black patients may be subject to systematic bias in physicians’ perceptions of their credibility,” the authors of the paper wrote.

That’s just one study in an incoming tide focused on the variations in the language that clinicians use to describe patients of different races and genders. In many ways, the research is just catching up to what patients and doctors knew already, that discrimination can be conveyed and furthered by partial accounts.

Author(s): Darius Tahir

Publication Date: 26 Sept 2022

Publication Site: Kaiser Health News

An Actuarial View of Correlation and Causation—From Interpretation to Practice to Implications

Link: https://www.actuary.org/sites/default/files/2022-07/Correlation.IB_.6.22_final.pdf

Graphic:

Excerpt:

Examine the quality of the theory behind the correlated variables. Is there good
reason to believe, as validated by research, the variables would occur together? If such
validation does not exist, then the relationship may be spurious. For example, is there
any validation to the relationship between the number of driver deaths in railway
collisions by year (the horizontal axis), and the annual imports of Norwegian crude
oil by the U.S., as depicted below?36 This is an example of a spurious correlation. It is
not clear what a rational explanation would be for this relationship.

Author(s): Data Science and Analytics Committee

Publication Date: July 2022

Publication Site: American Academy of Actuaries