Just looking at these dots, we see that for engine size between 60 and 200, there is a linear increase in the weight. However, after an engine size of 200, the weight does not increase linearly but is leveling. So, this means that the relation between engine size and weight is not strictly linear.
We can also confirm the non-linear nature by performing a linear curve fit as shown below with a blue line. You will observe that the points marked in the red circle are completely off the straight line indicating that a linear line does not correctly capture the pattern.
We started by looking at the color of the cell which indicated a strong correlation. However, we concluded that it is not true when we looked at the scatter plot. So where is the catch?
The problem is in the name of the technique. As it is titled a correlation matrix, we tend to use it to interpret all types of correlation. The technique is based on Pearson correlation, which is strictly measuring only linear correlation. So the more appropriate name of the technique should be linear correlation matrix.
Nomograms are a trending term in evidence-based medicine, and COVID-19 research is no exception. In this context, a nomogram is usually a web-based tool, a graphic interface, or an on-line calculator in which patient data on several variables is entered as input, and a single summary statistic is calculated as output, such as the likelihood of successful response to treatment. Many medical researchers and data scientists have put forward nomograms derived from multivariate clinical progression models, to assist in decisions about COVID-19 triage.
Is this enthusiasm for reducing complex clinical decisions to the use of multivariate calculators a leap forward in personalized medicine, enabled by modern computing? There is a sketchy “black box” side to all this, to say nothing of the risk of incorporating statistical design errors or untenable inferential claims into a nomogram being rolled out for immediate, untested use in the middle of pandemic. So let’s treat the history of the “number needed to treat” as a “teachable moment” in the history of nomograms in medicine. What have we learned so far?
One problem may be the way we teach statistics to data scientists and public health professionals. Multivariable regression is often mistaken for a silver bullet that magically controls away confounding for all variables at once, as long as no confounder is left out. This is what statisticians call the “Table 2 fallacy,” because the adjusted effect sizes in a multivariable model are so often reported in Table 2. Many medical professionals learn to read research articles critically for understanding without ever having been introduced to the Table 2 fallacy.
Confounding is often taught as a purely mathematical concept, but that misses the point. Throwing a large set of variously interrelated variables into a big stepwise regression model might be expected to work, if all you know about confounding is that you should “never leave a confounder out” of your analysis.