Somewhere in PHE’s data pipeline, someone had used the wrong Excel file format, XLS rather than the more recent XLSX. And XLS spreadsheets simply don’t have that many rows: 2 to the power of 16, about 64,000. This meant that during some automated process, cases had vanished off the bottom of the spreadsheet, and nobody had noticed.
The idea of simply running out of space to put the numbers was darkly amusing. A few weeks after the data-loss scandal, I found myself able to ask Bill Gates himself about what had happened. Gates no longer runs Microsoft, and I was interviewing him about vaccines for a BBC program called How to Vaccinate The World. But the opportunity to have a bit of fun quizzing him about XLS and XLSX was too good to pass up.
I expressed the question in the nerdiest way possible, and Gates’s response was so strait-laced I had to smile: “I guess… they overran the 64,000 limit, which is not there in the new format, so…” Well, indeed. Gates then added, “It’s good to have people double-check things, and I’m sorry that happened.”
Exactly how the outdated XLS format came to be used is unclear. PHE sent me an explanation, but it was rather vague. I didn’t understand it, so I showed it to some members of Eusprig, the European Spreadsheet Risks Group. They spend their lives analyzing what happens when spreadsheets go rogue. They’re my kind of people. But they didn’t understand what PHE had told me, either. It was all a little light on detail.
Author(s): Tim Harford
Publication Date: 29 June 2021
Publication Site: Financial Post