Underdispersion in the reported Covid-19 case and death numbers may suggest data manipulations

Link: https://www.medrxiv.org/content/10.1101/2022.02.11.22270841v1

doi: https://doi.org/10.1101/2022.02.11.22270841

Graphic:

Abstract:

We suggest a statistical test for underdispersion in the reported Covid-19 case and death numbers, compared to the variance expected under the Poisson distribution. Screening all countries in the World Health Organization (WHO) dataset for evidence of underdispersion yields 21 country with statistically significant underdispersion. Most of the countries in this list are known, based on the excess mortality data, to strongly undercount Covid deaths. We argue that Poisson underdispersion provides a simple and useful test to detect reporting anomalies and highlight unreliable data.

Author(s): Dmitry Kobak

Publication Date: 13 Feb 2022

Publication Site: medRXiV

Coffee Chat – “Data & Science”

Link:https://www.youtube.com/watch?v=S5GHsjgSl1o&ab_channel=DataScienceWithSam

Video:

Excerpt:

The inaugural coffee chat of my YouTube channel features two research scholars from scientific community who shared their perspectives on how data plays a crucial role in research area.

By watching this video you will gather information on the following topics:

a) the importance of data in scientific research,

b) valuable insights about the data handling practices in research areas related to molecular biology, genetics, organic chemistry, radiology and biomedical imaging,

c) future of AI and machine learning in scientific research.

Author(s):

Efrosini Tsouko, PhD from Baylor College of Medicine; Mausam Kalita, PhD from Stanford University; Soumava Dey

Publication Date: 26 Sept 2021

Publication Site: Data Science with Sam at YouTube

Predictably inaccurate: The prevalence and perils of bad big data

Link: https://www2.deloitte.com/us/en/insights/deloitte-review/issue-21/analytics-bad-data-quality.html

Graphic:

Excerpt:

More than two-thirds of survey respondents stated that the third-party data about them was only 0 to 50 percent correct as a whole. One-third of respondents perceived the information to be 0 to 25 percent correct.

Whether individuals were born in the United States tended to determine whether they were able to locate their data within the data broker’s portal. Of those not born in the United States, 33 percent could not locate their data; conversely, of those born in the United States, only 5 percent had missing information. Further, no respondents born outside the United States and residing in the country for less than three years could locate their data.

The type of data on individuals that was most available was demographic information; the least available was home data. However, even if demographic information was available, it was not all that accurate and was often incomplete, with 59 percent of respondents judging their demographic data to be only 0 to 50 percent correct. Even seemingly easily available data types (such as date of birth, marital status, and number of adults in the household) had wide variances in accuracy.

Author(s): John Lucker, Susan K. Hogan, Trevor Bischoff

Publication Date: 31 July 2017

Publication Site: Deloitte

Excel autocorrect errors still plague genetic research

Link: https://cosmosmagazine.com/science/biology/excel-autocorrect-errors-still-plague-genetic-research/

Graphic:

Excerpt:

Earlier this year we repeated our analysis. This time we expanded it to cover a wider selection of open access journals, anticipating researchers and journals would be taking steps to prevent such errors appearing in their supplementary data files.

We were shocked to find in the period 2014 to 2020 that 3,436 articles, around 31% of our sample, contained gene name errors. It seems the problem has not gone away, and is actually getting worse.

Author(s): Mark Ziemann, Deakin University and Mandhri Abeysooriya, Deakin University

Publication Date: 27 August 2021

Publication Site: Cosmos magazine

Rebekah Jones’s Lies about Florida COVID Data Keep Piling Up

Link: https://www.nationalreview.com/2021/06/rebekah-joness-lies-about-florida-covid-data-keep-piling-up/

Excerpt:

One of the most persistent falsehoods of the COVID pandemic has been the claim that Florida has been “hiding” data. This idea has been advanced primarily by Rebekah Jones, a former Florida Department of Health employee, who, having at first expressed only some modest political disagreements with the way in which Florida responded to COVID, has over time become a fountain of misinformation.

…..

To understand what is happening here, one needs to go back to the beginning. Over the past 15 months, Florida has published a truly remarkable amount of COVID-related data. At the heart of this trove has been a well-maintained list of literally every documented case of COVID — listed by county, age, and gender, and replete with information about whether the patient had recently traveled, had visited the ER, had been hospitalized, and had had any known contact with other Floridians. To my knowledge, Florida has been the only state in the union that has published this kind of data.

…..

To this day, you can download Florida’s case-line data and see 21 cases of COVID that, despite having been identified between March 2020 and December 2020, feature a December 2019 “Event Date.” To anyone who understands data, these results are clearly the product of the system having assigned a non-null default value when no data has been entered. To the Miami Herald, however, these results hinted at scandal. Even now, when its reporters know beyond any doubt that their initial instincts were wrong, the Herald continues to tell its readers that these entries serve as “evidence of community spread potentially months earlier than previously reported.” This is not true.

Author(s): Matt Shapiro

Publication Date: 8 June 2021

Publication Site: National Review

Alameda County Updates COVID-19 Death Calculation to Align with State Definitions

Link: https://covid-19.acgov.org/covid19-assets/docs/press/press-release-2021.06.04.pdf

Excerpt:

Today, June 4, Alameda County’s COVID-19 dashboard will be updated to reflect
the total number of COVID-19 deaths using the State’s death reporting definition. Alameda County previously included any person who died while infected with the virus in the total COVID-19 deaths for the County. Aligning with the State’s definition will require Alameda County to report as COVID-19 deaths only those people who died as a direct result of COVID-19, with COVID-19 as a contributing cause of death, or in whom death caused by COVID-19 could not be ruled out. Based on data available as of May 23, 2021, this update will decrease the overall number of deaths from 1,634 to 1,223.

….

This update does not disproportionally impact reported deaths for any specific race or ethnic group or zip code.

Close observers of Alameda County’s dashboard may have noticed a substantial increase in the COVID-19 death totals prior to this update, during the week of May 17. This increase was due to a separate quality assurance process intended to correct previously incomplete data; adjustments were made based on additional information that became available regarding date of death and county of residence. These corrections are unrelated to the current alignment with the State’s definition of death due to COVID-19, and some of the deaths will be removed from the updated totals because COVID-19 was not a contributing cause.

Author(s): Neetu Balram

Publication Date: 4 June 2021

Publication Site: Alameda County Health Care Services Agency