The Say-Do Gap: Why What Consumers Report Doesn’t Match What They Buy

Q: Why do consumers misreport their behavior?

Four primary drivers: brand blindness (not reporting brands they do not consciously track), identity-driven filtering (omitting purchases that do not match their self-concept), timing reconstruction failure (inability to recall when or why they switched brands), and context-dependent decision-making (stating price thresholds that do not hold under urgency or emotion). These are features of human cognition, not survey design flaws.

The say-do gap — the divergence between what consumers report and what they actually buy — is the most expensive unsolved problem in consumer research. The industry knows it exists. It has built an entire infrastructure of fraud detection, attention checks, and data cleaning to cope with it. But coping is not solving. The gap persists because the fundamental approach hasn’t changed: we are still asking people to be reliable narrators of their own behavior, and they are not.

The Gap Is Well-Documented

The say-do gap is not a theoretical concern. It has been measured repeatedly, across methodologies, by organizations with no incentive to overstate the problem.

In a meta-analysis of purchase intent studies, 83% of consumers said they would buy a product, but only 42% bid real money when given the opportunity — a gap of nearly half.¹ Separately, research from Gartner and CEB found that stated preference predicts actual purchase behavior with roughly 34% accuracy.² That means two out of three stated-preference signals point in the wrong direction.

The gap extends beyond purchase intent into how consumers describe their own behavior. A study by 84.51°, Kroger’s analytics division, compared stated shopping behavior to verified transaction records and found that 60% of respondents were misclassified into the wrong buyer segments based on their survey answers. More striking: 75% of respondents had never actually purchased in the product categories they claimed to buy.⁸

These are not edge cases. They are the central tendency of stated-preference data when measured against behavioral ground truth.

The downstream consequences are significant. Nielsen has reported that 85% of new CPG products fail within two years — despite the fact that nearly all of them passed consumer research before launch.⁶ The research said yes. The market said no. Something in the signal was wrong.

Why It Happens — The Human Blind Spots

The say-do gap is not a survey design problem. It is a human cognition problem. People are not lying to researchers. They are reporting what they believe to be true — and what they believe to be true is often wrong.

Brand blindness

Ask a consumer what brands they buy for phone cables and they’ll say “I don’t really have a brand.” Their Amazon account shows they’ve bought the same brand six times in two years. They just can’t pronounce the name, so it doesn’t register as a choice. These brands — private-label, algorithmic bestsellers, “Amazon’s Choice” picks — take significant market share but are invisible in survey data because consumers don’t encode them as brand decisions.

Identity-driven reporting

A man buys cat food every two weeks for four years. He doesn’t own a cat — he feeds neighborhood strays. Ask him if he buys pet products and the answer is no. The purchase is real and consistent, but it doesn’t match his self-concept, so it doesn’t exist in his reported data. People systematically underreport purchases that feel incidental, embarrassing, or inconsistent with how they see themselves.

Timing reconstruction

A shopper switched from Tide to a store-brand detergent sometime last year. Ask when and she’ll guess. Ask why and she’ll rationalize. The purchase record shows the exact date — and that it happened the week the store-brand ran a BOGO promotion. Consumers know they switched. They cannot reconstruct when, why, or what triggered it.

Context-driven price insensitivity

A consumer says $17 is their ceiling for batteries. Honest answer. Then their kid opens a Hot Wheels track on Christmas morning — no batteries included. They immediately buy the most expensive batteries available with expedited shipping. The stated price threshold was real in one context and irrelevant in another. Surveys capture the number. They don’t capture the moments when the number stops mattering.

People are not lying. They are doing their best with imperfect self-knowledge. The say-do gap is not a problem of respondent quality. It is a structural limitation of asking humans to be reliable narrators of their own behavior.

The Industry Has Learned to Cope, Not Solve

The insights industry is fully aware of the say-do gap. Over decades, it has developed an extensive infrastructure for managing the problem — attention checks, pre-qualification screeners, trap questions, data cleaning protocols, and fraud detection products.

The scale of the problem justifies the investment. Analysis of 4.1 billion survey attempts found that 33% were fraudulent.³ AI-generated survey fraud has increased 43% as large language models make it easier to produce plausible-sounding responses at scale.⁴ Researchers report discarding 38% of collected survey data on average before analysis begins.⁵

These are serious, well-engineered countermeasures. They solve the fraud problem — the problem of people who are intentionally providing bad data. But they do not solve the say-do gap itself. Even a perfectly honest, fully attentive, carefully screened respondent still cannot accurately report brands they do not consciously track, purchases that do not match their self-concept, the precise timing of behavioral shifts, or their price sensitivity under conditions they are not currently experiencing.

The industry has trained itself to take stated data with a grain of salt and work around the uncertainty. Researchers apply discount factors to purchase intent scores. They triangulate across multiple stated-preference methods. They use conjoint analysis to back into preferences indirectly. These are intelligent adaptations. But they are all adaptations — ways of coping with the absence of a direct behavioral signal, rather than obtaining one.

The fundamental question remains: what did this person actually buy?

What Changes When You Have the Actual Record

When consumers connect their retail accounts directly — with consent, in exchange for clear value — the resulting data eliminates entire categories of uncertainty that stated-preference methods cannot resolve.

Longitudinal depth without recall dependency

A connected retail account does not provide a snapshot. It provides up to five years of purchase history — every SKU, every price paid, every date, across every order. You can study the same person’s behavior before, during, and after COVID. You can track brand loyalty over seasons, across life events, through price changes. Try asking a consumer to recall their shopping habits from 2020. They cannot reconstruct it. The account can.

Retroactive availability

If you recruit a panelist today for a receipt-scanning program, you get data starting today. There is no way to recover their purchase history from the past three years — those receipts no longer exist. Account connection reverses this constraint. The moment someone connects, their full purchase history becomes available. You are not building a panel that will become useful in six months. You have behavioral data from day one.

Structural fraud elimination

There is no way to fabricate a purchase you did not spend your own money on. The data is pulled directly from the retailer’s transaction record. No receipt photos to stage. No surveys to game. No open-ended responses to generate with AI. The effective fraud rate is zero — not because of better screening, but because the methodology does not create an opportunity for fraud to occur.

Closing the gaps that receipt scanning leaves open

Receipt scanning was an important step beyond surveys. It captures real purchases with real prices. But it still depends on human compliance. People forget to scan. They miss receipts. They stop scanning after the novelty wears off. In incentivized panels, some participants scan receipts from friends or family members to earn additional rewards. Direct account connection removes the human compliance layer entirely. The data arrives complete, unedited, and continuous.

The complement, not the replacement

The point is not that surveys are obsolete. Surveys capture perception, emotion, motivation, and reasoning — dimensions that purchase data cannot observe. The say-do gap is a problem specifically when surveys are used to predict behavioral outcomes — what people will buy, at what price, how often, and from whom. That is where the actual transaction record provides a direct signal that stated preference cannot match. The strongest research programs use both: stated data for the perceptual layer, behavioral data for the transactional layer.

Frequently Asked Questions

What is the say-do gap?

The say-do gap is the divergence between what consumers report in research settings — their stated preferences, purchase intentions, and self-described behavior — and what they actually do in the real world. It is a well-documented phenomenon driven not by dishonesty but by the structural limitations of human memory, self-perception, and contextual reasoning.

How big is the say-do gap?

The magnitude varies by category and methodology, but the core findings are consistent. Stated purchase intent predicts actual behavior with roughly 34% accuracy.² In controlled experiments, 83% of consumers say they would buy, but only 42% commit real money.¹ When 84.51° compared survey-stated shopping behavior to verified Kroger transaction records, 60% of respondents were placed in the wrong buyer segments and 75% had never purchased in the categories they claimed.⁸

Why do consumers misreport their behavior?

Four primary drivers: brand blindness (not reporting brands they do not consciously track), identity-driven filtering (omitting purchases that do not match their self-concept), timing reconstruction failure (inability to recall when or why they switched brands), and context-dependent decision-making (stating price thresholds that do not hold under urgency or emotion). These are features of human cognition, not survey design flaws.

How does the say-do gap affect business decisions?

Pricing strategies built on stated willingness-to-pay may not reflect real-world price elasticity. Product launches validated by inflated purchase intent face an 85% failure rate within two years.⁶ Competitive analysis based on stated brand preferences misses actual switching behavior. Segmentation built on self-reported purchase patterns misclassifies the majority of consumers.⁸

Can surveys still be useful despite the say-do gap?

Yes. Surveys are the right tool for measuring perception, brand sentiment, unmet needs, and emotional response — questions where the consumer’s subjective experience is the data. The say-do gap is a problem specifically when surveys are used to predict what consumers will buy, how much they will pay, or how often they will purchase. For those behavioral questions, direct transaction data provides a more reliable signal.

What is the alternative to surveys for understanding purchase behavior?

Consent-based account connection allows consumers to share verified transaction records directly from their retail accounts. This provides SKU-level purchase data with full pricing, exact dates, and longitudinal history — typically spanning multiple years. Because the data comes from the retailer’s own records, it carries a zero effective fraud rate and does not depend on respondent recall, compliance, or honesty. It complements surveys by providing the behavioral layer that stated preference cannot deliver.

Sources

Where to go from here

How Ario Works → Digital Twins for Consumer Research → Try It Now →