Skip to content
  • Blog
  • Good Conference
  • Startup Marketing
  • Bitnorth
  • Perambulation
  • About
RSS Feed
Solve for Interesting logo

Solve for Interesting

Otherwise life is dull.

The perils of metadata and privacy

October 14, 2014 – 4:47 pm by Alistair | Posted in Perambulation | Tagged data, LBGT, metadata, privacy, suicide |

1 Comment

Get the short URL

Tweet

Many modern approaches to data analysis rely on crunching metadata, without peering into the actual content of a customer’s record. For example, you might look at whether the person provided their zip code, without actually looking at the zip code, and use that as an input into some kind of mathematical model.

There are plenty of risks in even this level of analysis, however. Here’s an example (based on some rough math; there are plenty of different estimates based on gender and location.)

In an Ontario survey, 45 percent of transgendered respondents said that they attempted suicide. In Ontario, 7.86 men in 100,000 commit suicide, and the World Health Organization estimates that actual suicides are only 1/20th of all suicides. That makes 157.2 attempts per 100,000 people—or 0.15% of people. That means a transgendered person is over 286 times more likely to try and kill themselves.

Putting aside the horrible human cost of this for a moment, it turns out that the financial costs of treating attempted suicide are high, too. According to the NIH, a suicide attempt has medical costs of $US 13,536.

Now imagine it’s your job to identify insurance applicants who might cost your company more money. A transgendered person often changes their name, or has a first name that doesn’t match legal documentation.

Set an algorithm loose on the metadata, and it could find a correlation between people whose first names have been changed in a database—or differ from their names in official records—and those who will cost more to insure.

That’s what the algorithms at Orbitz did when they decided Mac users had more money to spend, and served up higher-priced travel offerings based on the visitor’s web browser. It wasn’t a human decision—it was an algorithm converging on a particular set of attributes that suggested someone was likely to spend more.

The ethics of big data are murky, partly because algorithms are opaque and often confusing. You may think that by only using metadata, you’re being respectful of users’ privacy. But metadata is leaky, and relying on it to protect user data is risky at best.

One Comment

  • Don’t You Forget About Me « The Lexwerks wrote:
    October 24, 2014 at 12:39 am

    […] do manage to have “forgotten.” As venture capitalist Alistair Croll observed, “metadata is leaky, and relying on it to protect user data is risky at best.” Or as the Penguin said in Batman Returns, “What you hide, I discover. What you put […]

Leave a Reply Cancel reply

Your email is never shared.Required fields are marked *

*
*

← The challenges of 2-sided marketplaces
Old laws, new problems →

Sidebar

Related Posts

  1. Private data in public ways

    Last night, I attended a fascinating series of presentations on data visualization as part of London’s Big Data Week. Put on by a number of tech evangelists and companies around the world, it’s part of a global series of talks and hackathons focusing on data-driven innovation. In the final session, Dr. James Cheshire talked aboutRead more

©2023 Solve for Interesting