More than two-thirds of survey respondents stated that the third-party data about them was only 0 to 50 percent correct as a whole. One-third of respondents perceived the information to be 0 to 25 percent correct.
Whether individuals were born in the United States tended to determine whether they were able to locate their data within the data broker’s portal. Of those not born in the United States, 33 percent could not locate their data; conversely, of those born in the United States, only 5 percent had missing information. Further, no respondents born outside the United States and residing in the country for less than three years could locate their data.
The type of data on individuals that was most available was demographic information; the least available was home data. However, even if demographic information was available, it was not all that accurate and was often incomplete, with 59 percent of respondents judging their demographic data to be only 0 to 50 percent correct. Even seemingly easily available data types (such as date of birth, marital status, and number of adults in the household) had wide variances in accuracy.
Author(s): John Lucker, Susan K. Hogan, Trevor Bischoff
Understanding why a model makes a certain prediction can be as crucial as the prediction’s accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Author(s): Scott M. Lundberg, Su-In Lee
Publication Date: 2017
Publication Site: Conference on Neural Information Processing Systems