Will the Real Psychometric Targeters Please Stand Up?

A skeptic’s take on Trump’s purported big data juggernaut, Cambridge Analytica


For the past week or so, an article titled “The Data That Turned the World Upside Down” has been following me around like a bad headcold.

The article tells a compelling “whodunit” story of data scientists, engineers, and political communicators, all fighting for control of a new, weaponized form of online political propaganda. It leads readers to the conclusion that conservative data vendor Cambridge Analytica (CA) used Big Data and “psychometric targeting” (also called psychographic targeting) to propel Donald Trump to the White House.

I’ve written about Cambridge Analytica before (several times, in fact). I am on record as a loud CA skeptic. I have described them as the Theranos of political data: I think they have a tremendous marketing department, coupled with a team of research scientists who provide on virtually none of those marketing promises.

And though I tried my best to read the article with an open mind, I am still left with the same fundamental skepticism about this new brand of political data alchemy.*

To be clear, I am not questioning the underlying science of psychometric targeting. Psychometric targeting simply categorizes individuals according to the standard “big five” personality traits, then treats these categories as market segments for the delivery of targeted advertising. Targeted advertising based on psychometrics is conceptually quite simple and practically very complicated. And there is no evidence that Cambridge Analytica has solved the practical challenges of applying psychometrics to voter behavior.

Here is a list of what you would need in order to apply psychometrics to voter behavior:

  1. A comprehensive file of psychographic data on American citizens. Alexander Nix, the CEO of Cambridge Analytica, told the authors that his company has “profiled the personality of every adult in the United States of America—220 million people.” But, in a statement after the original publication of the article, the company also claims that it does not use data from Facebook and hardly used psychographics at all. So it is unclear where these comprehensive files are supposed to have come from, or how robust they are.

  1. A comprehensive national voter file, matched to this psychographic data. As Daniel Kreiss shows in his new book, Prototype Politics, the Republican voter file was still very much a work-in-progress during the 2016 election. Matching this data to CA’s purported psychographic file would be a hairy technical endeavor, involving heavy collaboration from other Republican vendors who instead have downplayed CA’s role and raised questions about its transparency.

  1. A massive creative team to craft targeted messages for each of these audience segments. This is one of the (many) insights from Eitan Hersh’s 2015 book, Hacking the Electorate. The more segments a campaign creates within a voter universe, the more distinct messages that campaign has to develop, test, and refine. Even if Cambridge Analytica correctly assigned every American to one of its 32 psychographic categories AND linked those profiles to a national voter file, the data would only become useful if the Trump communications operation was crafting distinct messages for each of the categories. But we know for a fact that the Trump campaign had a bare-bones communications staff. If CA had been able to hand the Trump communications team a detailed psychographic assessment of every targeted voter, the practical response would have been a bit like Henry Ford’s old comment: “The customer can have any color he wants so long as it’s black.”

It’s also worth noting that, in post-campaign debrief sessions, psychographic targeting has completely vanished from Cambridge Analytica’s presentations. At a symposium last month hosted by Civic Hall and the Knight Foundation, Molly Schweickert (CA’s head of digital) instead described their data operation as “going into the field on a weekly basis to collect hard ID responses [and] scoring individuals on candidate preference, issues they cared about, and likelihood to turnout.” Scoring voters based on likelihood of candidate support and likelihood of turnout is nothing new. As Sasha Issenberg documented in The Victory Lab, this was the cutting edge innovation of the 2008 Obama campaign. Rather than bragging about a new leap forward in voter targeting, Schweickert is effectively boasting that the Republicans have caught up to the Democrats.

The simple explanation here is that Cambridge Analytica has been engaging in the time-honored Silicon Valley tradition of developing a minimum viable product (vaporware, essentially), marketing the hell out of it to drum up customers, and then delivering a much more mundane-but-workable product. The difference here is that CA’s marketing has gotten caught up in our collective search for the secret formula that put Donald Trump in the White House.

But here’s the tough reality: “Moneyball” doesn’t always win. Donald Trump’s campaign didn’t possess a secret data innovation. His unlikely victory was due to a messy confluence of factors. The world has indeed been turned upside down by this election, but data scientists were not the cackling villains hidden just offstage. Trump ran a deeply flawed campaign! Hillary Clinton also ran a flawed campaign! There was also an organized anti-Clinton disinformation campaign, semi-coordinated by Vladimir Putin and Wikileaks! And the FBI ambushed Clinton two weeks before the election, while denying that they were conducting an investigation into links between the Trump campaign and the Russian propaganda effort!

The stories of Cambridge Analytica’s omniscience are fiction. The 2016 election was stranger than fiction.

*The article was original published in Zurich-based Das Magazin. Like so much of the Trump communications empire, it surely sounds better in the original German.

  • Emily Patton

    1. “A comprehensive file of psychographic data on American citizens:”
    https://datafloq.com/read/amazon-leveraging-big-data/517

    2. “A comprehensive national voter file, matched to this psychographic data:”
    http://www.moonshadowmobile.com/products/ground-game-mobile-canvassing/

    3. “A massive creative team to craft targeted messages for each of these audience segments:”
    http://www.socialmediaexaminer.com/21-ways-to-improve-your-facebook-ads-with-ad-targeting/
    https://www.facebook.com/business/a/online-sales/ad-targeting-details

    realistically, 1+2 are all you need to tailor door-knocks while canvassing.
    furthermore, 3 is all a couple of millenial trump supporters with big wallets would need.

    for bonus points: this messaging system is itself a user-data aggregator for targeted ads: https://help.disqus.com/customer/en/portal/articles/1657951-data-sharing-settings

  • http://shanacarp.com/essays ShanaC

    1) Call Epsilon and ShareThis. Or just walk into an IAB meeting and ask who sells data. Or check the relevant Lumascape of Doom.

    2) Actually, they don’t need a comprehensive data set. Just a right sized one, properly tagged. Per emily patton, there is technology to get that on the ground. Plus they managed to get some data in the primaries, by working with ted cruz. A well constructed Neural Network can be retuned to label all incoming data afterwards.

    That’s why Neural Networks are, in part, so great. I think everyone in campaign (and actually, large swarths of marketing) is missing this as part of the story: you can jump from a smaller set of data from the primaries, and use it in the general elections to massive magnify your reach. So that complaint from ad age actually worked to their advantage. Failure for a NN company is running out of customers to test on and or money.

    3)

    Why in bloody blazes would you want to make your own creative in a campaign? It is way cheaper to spread the reach of pr from fake and real news organizations. It is cheaper to boost your own supporter’s reach than to maximize your own reach. Plus, nearly all research into structural virality says you’ll get more reach by using a strategy where you aren’t broadcasting yourself, but boosting other people. (Duncan Watts, Jure Leskovec, and others have done extensive research in this area. It gets published at the ACM and in Nature regularly. marketers are silly people and don’t read this stuff. Beyond me why)

    Even if evidence says this is the smartest way to run a campaign, this method has the added benefit of having really low creative costs ( you are running a repackaging operation) plus is also an investment in getting data. There is no downside. Which is why most media reports on the trump campaign imply they made very little if any real creative – and instead shopped other creatives

    And we know that’s what the Trump campaign did. They called reporters to shop stories that the russians did, and even if real reporters didn’t take the stories, fake and conservative outlets did, and then campaign boosted those stories. They boosted stories on facebook from other sources. They invested very little in making creative media, because it was a waste of money.

    Cause, you know, Fortune covered what they did

    here are quotes from the kushner interview on the subject

    Kushner structured the operation with a focus on maximizing the return for every dollar spent. “We played Moneyball, asking ourselves which states will get the best ROI for the electoral vote,” Kushner says. “I asked, How can we get Trump’s message to that consumer for the least amount of cost?” FEC filings through mid-October indicate the Trump campaign spent roughly half as much as the Clinton campaign did.

    Television and online advertising? Small and smaller. Twitter and Facebook would fuel the campaign, as key tools for not only spreading Trump’s message but also targeting potential supporters, scraping massive amounts of constituent data and sensing shifts in sentiment in real time.

    This wasn’t a completely raw startup. Kushner’s crew was able to tap into the Republican National Committee’s data machine, and it hired targeting partners like Cambridge Analytica to map voter universes and identify which parts of the Trump platform mattered most: trade, immigration or change. Tools like Deep Root drove the scaled-back TV ad spending by identifying shows popular with specific voter blocks in specific regions–say, NCIS for anti-ObamaCare voters or The Walking Dead for people worried about immigration. Kushner built a custom geo-location tool that plotted the location density of about 20 voter types over a live Google Maps interface.

    Here is Brad Parscale in NPR on what they specifically were doing

    So if you had an app on your iPhone or Android, you could go knock doors. And when you knock that door, you push a button on your iPhone – you talked and communicated with this person. It would immediately send back to our database, so now we know we don’t have to communicate with that person in another method, where Democrats still use pieces of paper that have to be scanned in and then databased later. So they weren’t getting real-time information on their door-knocking program. That’s a huge advantage for us into our ground game.

    (don’t worry about that vote database in other words)

    PARSCALE: Well, I mean, I think all campaigns run negative and positive ads. So we would target those to those people who we felt were in the middle till we could move them over either to an undecided or back into a Trump column. We found data, and we ran hundreds of thousands of brand-lift surveys and other types of tests to see how that content was affecting those people so we could see where we were moving them.

    ( they ran surveys!, not traditional creatives)

  • brittblaser

    “a team of research scientists who provide on virtually none of those marketing promises”.

    “deliver on” was probably the intended verb.