Podcast episode
Episode 137: The Data Quality Crisis in Digital Advertising with Scott McKinley of Truthset
data-brokers identity measurement programmatic publisher-economics
TL;DR
Scott McKinley, founder/CEO of Truthset and a former Nielsen EVP, makes the case that digital advertising's data quality is far worse than buyers assume — audience files that start ~90% accurate degrade to ~22-23% by the time they reach the bid stream, and the industry has no independent regulator or accountability mechanism. His prescription: stop chasing reach/scale, treat effective CPMs honestly by factoring in data error, and authenticate users to rebuild durable identity for the open internet. Worth a listen for anyone in data, identity, or measurement who wants a blunt, contrarian framing of "signal loss" (which he reframes as "there never was signal — it's static").
What was covered
- Truthset's premise: The company doesn't sell data; it measures the accuracy of other providers' data so buyers/sellers can price quality "like gas at a gas pump." McKinley incubated the idea inside Nielsen before spinning out.
- Data degradation through the supply chain: A file ~90% accurate at source loses more than half its addressable IDs and absorbs up to 50% error at each of four-to-five hops (data provider → onboarding → DSP → SSP → bidstream), ending at ~22-23% accuracy before fraud/IVT (invalid traffic) is even counted.
- Variance across providers: In Truthset's collective of 25 data providers, accuracy for the same segment ranges from ~20% correct to 60-70% correct — debunking the assumption that "all data is roughly equal."
- No regulation/accountability: McKinley argues advertising is unique among large industries (~$700B) in having no SEC/FDA-style oversight, no ombudsman, and no recourse — "you can spend a million dollars buying a thing and you don't receive the thing, and nobody goes to jail."
- Nielsen's legacy and decline: He defends Nielsen's historical role as independent TV measurement (earning ~$2B against a $70B TV market, ~3% take rate) but says the industry "threw the baby out with the bathwater" by discrediting Nielsen without replacing independent measurement. He noted Dave Calhoun (then CEO) tasked him with disrupting Nielsen's own products.
- Reach as a broken KPI: A $10 CPM (cost per thousand impressions) against an audience that's only 20% accurate is really a $50 effective CPM. Large CPG/auto advertisers cling to legacy reach-based models that no longer produce linear sales lifts outside live sports.
- The SMB contrast: Three-person Shopify dropshippers buying straight from Meta get near-immediate measurable returns, while big-CPG media directors keep playing "by the old rules."
- Identity and authentication as the fix: Walled gardens win because of durable, authenticated identity. The open internet must authenticate logged-in users — possibly via a federated "clearinghouse" — to lift yields from ~$2 CPM toward $20.
- IP addresses debunked: A study for an unnamed party found IP-to-household accuracy at 13% and IP-to-postal at 16%; replacing cookies with IP as CTV grows is "jumping from a lukewarm frying pan into a scalding hot fire."
Notable claims & predictions
- "We built a $700 billion industry that runs on faith… a lot like religion. In religion, you don't find out if you're right until you die. In advertising, you find out you're right or wrong after the campaign, after the money's spent." — McKinley
- "An audience file that might be 90% accurate at source… you're going to end up with… 22% accurate, 23% accurate, and that's before fraud and all the other things." — McKinley
- "There never was any signal. This is the lie… signal loss [is wrong] — there's a new metaphor for the problem of data: it's called static. The last 20 years was mostly static." — McKinley
- "A dollar pushed into Meta or Google is gonna return a buck 33 or a buck 50… It's a nice vending machine, but that's not healthy either because you're outsourcing intelligence about your future customers to these companies and they're gonna arbitrage it and leverage it against you." — McKinley
- "The way you get a $20 CPM instead of a $2 CPM is you freaking authenticate your users. And if your content's not good enough to earn that value exchange, your content's not that great." — McKinley
- "The average accuracy of IP to household is 13% correct and IP to postal is 16% correct." — McKinley (citing a study)
- Prediction: Federating identity through "some sort of clearinghouse or independent mechanism" is "inevitable" — the remaining ~17-18 cents of the open-web dollar will keep flowing to walled gardens unless the open ecosystem authenticates users.
Full analysis
Step 1 — Frame
A former Nielsen executive who now runs a data-accuracy company argues that the data underpinning digital advertising is mostly junk — audience files that start ~90% accurate end up ~22% accurate by the time a buyer bids on them. His fix: stop chasing reach, price data quality honestly, and authenticate logged-in users to rebuild durable identity on the open web.
What's actually being decided (briefing mode): Should ad-tech operators — publishers, agencies, DSPs/SSPs, identity and measurement vendors — treat data quality and authentication as the next big strategic battleground, or is this a vendor with a measurement product talking his own book?
Reversibility: Type 2 for any single operator (you can pilot accuracy scoring or authentication walls and back out). Type 1 for the industry's structural drift toward walled gardens — that's hard to reverse and getting harder.
Timeline / forcing function: No hard deadline, but the slow bleed of open-web dollars to Meta/Google/Amazon is the forcing function. CTV's growth — and its lean on IP addresses — adds urgency.
This is clear enough to proceed.
Step 2 — The Council
The Market Analyst Self-interest alert: a company that sells accuracy scoring tells you data is wildly inaccurate. That's Gartner-grade incentive. But the direction is real even if the numbers are dramatized. Walled gardens taking ~80 cents of every digital ad dollar isn't because their ad servers are better — it's because they know who the user is. For investors, the tell is that the open-web independents (Magnite, PubMatic, Index, Criteo) keep growing revenue but can't escape the ceiling on what an un-authenticated impression is worth. Plain version: the companies that know your name make more money per ad than the ones guessing. The pure-play accuracy-measurement category, though, has been a graveyard — buyers say they'll pay for quality, then buy on price.
The Skeptic The load-bearing assumption is that "90% to 22%" is a meaningful number. Accuracy of what, against what truth set? Truthset's own ground truth is itself a probabilistic model — there's no oracle of who lives in a household. The 50%-error-per-hop math compounds suspiciously cleanly. And the punchline — "authenticate your users" — is advice publishers have heard for a decade. They know. They can't, because logging in kills reach and most content isn't worth a registration wall. Plain version: telling publishers to make people log in isn't a strategy, it's the problem restated. The IP stat (13% to household) is probably his strongest, least self-serving claim.
The Operator Tuesday morning, a publisher VP wants to act on this. What breaks first? Auth walls crater your audience — you trade a $2 CPM across 10 million users for a $20 CPM across 800,000, and your sales team can't fill the larger guarantees they already sold. Second-order at 90 days: your data partners push back when you start scoring them, because half your "high-value segments" are revenue you booked. Plain version: the honest number makes this quarter look worse before it ever looks better. The realistic move isn't a wall — it's authenticated-light: registration for newsletters, commerce, gated tools, then stitch consented IDs into a first-party graph.
The Customer / End User (the media buyer) From a CPG or auto media director's chair: McKinley is describing my reality but prescribing a job I can't get approved. My CMO is measured on reach and frequency because that's what the board deck shows. "Effective CPM is really 5x" is true and unsayable in a budget review — it implies I've wasted money for years. Plain version: buyers know the data is shaky; admitting it out loud indicts their own track record. The SMB-on-Shopify contrast stings because it's accurate: small advertisers buy outcomes because they have no choice, big ones buy reach because they have cover.
Step 3 — The Tensions
-
Is the fix supply-side or demand-side? The Operator and Customer disagree on who moves first. Publishers can't justify auth walls until buyers pay for authenticated audiences; buyers won't pay a premium until enough authenticated supply exists. Classic standoff — neither side wants to go first.
-
Real problem, self-serving messenger. The Market Analyst and Skeptic both accept the direction (walled gardens win on identity) but distrust the magnitude. The danger is dismissing a true diagnosis because the doctor sells the cure.
-
Authentication as savior vs. authentication as the thing that already excluded the open web. Walled gardens authenticate because people want to be logged into Facebook. Most open-web content doesn't earn that. Telling publishers to "earn it" is either bracing honesty or a counsel of despair.
Step 4 — Synthesize
This hinges on two beliefs:
-
Is authenticated identity worth a 5–10x CPM premium that survives contact with reality? Partly yes — logged-in, consented audiences already command more. But the gap between $2 and $20 is mostly demand (advertisers trust walled-garden outcomes), not authentication alone. Auth is necessary, not sufficient.
-
Will buyers actually pay for measured data quality? History says no. Verification (DV, IAS) succeeded by riding fraud and brand-safety fear, not by selling abstract "accuracy." A quality-scoring layer probably only scales if it attaches to that same fear reflex or gets baked into a curation/clean-room workflow buyers already use.
The council leans: diagnosis largely correct, prescription mostly unworkable as stated, and the messenger's category is structurally hard. The actionable kernel for operators is the boring middle path — first-party data collection through commerce and registration, consented identity graphs, and curated authenticated deals — not dramatic login walls or a heroic industry "clearinghouse," which has been proposed and stalled repeatedly (see Unified ID 2.0's slow grind).
What to de-risk before acting: independently pressure-test the IP-accuracy claim against your own deterministic data — if IP really is ~15% accurate for household, that reshapes every CTV targeting and measurement plan you have, and that's the most checkable, highest-stakes claim in the episode.
Step 5 — The Prediction
Prediction: No industry-wide independent "data quality clearinghouse" or federated authentication standard for the open web will launch with meaningful adoption (multiple top-10 publishers plus a top-5 agency) by 2027-06-11. The momentum will stay with incremental first-party data and curation deals, not a new neutral arbiter.
Revisit by 2027-06-11: We're right if no neutral, independently-governed quality-scoring or authentication clearinghouse has signed broad publisher-and-agency adoption, and accuracy-measurement remains a niche vendor pitch. We're wrong if a cross-industry body or standard launches with disclosed adoption from several major publishers and at least one holding company.
The structural pull toward walled gardens is real, but the open web's answer to it has been incremental and self-interested for a decade, and nothing in this episode changes the incentives that keep it that way. Independent measurement is a public good, and public goods don't get built by companies that need to bill for them.