If you know what someone has bought for the past three years — every toothbrush, every impulse buy, every 2 AM purchase they forgot about — can a digital twin built from that data answer a survey as well as the human?
We tested it. This is an early, directional study — a small panel, a single retailer — but the results were clear enough to share. A larger validation is underway.
The Setup
We took a panel of real consumers who had completed two surveys and connected their Amazon purchase accounts through Ario’s data connector.
The first survey was about car care behavior — 22 questions about what car care activities they personally perform, from washing the exterior to applying tire shine. Factual, behavioral questions with clear right-or-wrong answers.
The second was a product concept test for a new pet cleaning product they had never used — questions about appeal, purchase intent, and how much they agreed with statements like “this product offers valuable features” and “I want this product.” Subjective, opinion-based questions with no objectively correct answer.
For each person, we built a digital twin: a behavioral profile constructed entirely from their purchase history. What they buy. How much they spend. Which categories they favor. How often they repurchase. Whether they explore new brands or stick with what they know.
We didn’t configure the twin for either survey or tell it what was being tested. It answered based solely on purchase history. We scored every prediction against the actual human response.
Two surveys, two very different types of questions — and a single question: how well does the twin do?
When Data Is Present, Twins Are Scary Accurate
The first survey asked 22 specific questions: Do you personally wash your car? Vacuum the interior? Apply tire shine? Clean the leather? For each, consumers answered one of three things: “I do it myself,” “Someone else does it,” or “Haven’t done it.”
These are factual questions. The human either does the activity or they don’t. And when the twin has relevant purchase data to work with, it gets these right — 88.4% accuracy across all 22 activities.
Here is what that looks like for one real consumer.
This consumer’s Amazon history reads like a garage build log: Fullway and Milestar tires, Wagner brake pads, Denso spark plugs, strut assemblies, a radiator fan kit, a professional diagnostic cable. Plus five dedicated car care products: Meguiar’s Endurance Tire Gel, Meguiar’s Ultimate Quik Wax, Armor All Headlight Wipes, and touch-up paint pens.
In the survey, he said: “I always wash and clean my car myself.”
Below is the side-by-side: what the human answered versus what the twin predicted, for each of the 22 car care activities.
| Activity | Human | Twin | Match |
|---|---|---|---|
| Wash the exterior | Personally | Personally | ✓ |
| Spot clean the exterior | Personally | Personally | ✓ |
| Remove bugs and/or tar | Personally | Personally | ✓ |
| Polish or remove scratches | Personally | Personally | ✓ |
| Apply wax, sealant, or coatings | Personally | Personally | ✓ |
| Apply trim protectant/restorer | Personally | Personally | ✓ |
| Polish the chrome/metal trim | Personally | Personally | ✓ |
| Clean/restore headlight lenses | Not a priority | Personally | — |
| Clean tough soils under the hood | Personally | Personally | ✓ |
| Vacuum or pick up trash | Personally | Personally | ✓ |
| Clean carpet/upholstery/floor mats | Personally | Personally | ✓ |
| Clean/protect leather surfaces | Personally | Personally | ✓ |
| Wipe down dashboard/console | Personally | Personally | ✓ |
| Apply interior protectant | Personally | Personally | ✓ |
| Use air freshener or deodorizer | Personally | Personally | ✓ |
| Clean tough soils/grime | Personally | Personally | ✓ |
| Clean inside of windows | Personally | Personally | ✓ |
| Clean outside of windows | Personally | Personally | ✓ |
| Clean rims/wheels/hubcaps | Personally | Personally | ✓ |
| Clean the tires | Personally | Personally | ✓ |
| Apply tire shine or dressing | Personally | Personally | ✓ |
| Clean household surfaces | Not a priority | Personally | — |
20 out of 22 correct. The twin read 48 automotive purchases — tires, brake pads, spark plugs, wax, tire gel — and inferred a DIY car enthusiast who does everything himself. It matched the human’s answers on 20 of 22 activities.
Purchase Data Predicts Opinions, Not Just Behavior
The second survey was a different kind of challenge. Consumers were shown a new pet cleaning product and asked to rate it on a five-point scale — how appealing it is, whether they would buy it, and how much they agreed with statements like “this product offers valuable features” and “I want this product.”
These are opinion questions. There is no objectively correct answer. And unlike the car care survey, the twin can’t just match products to activities — it has to infer how someone would feel about something.
We measured accuracy as “within one step” on the five-point scale. If the human said “very appealing” and the twin predicted “somewhat appealing” or “extremely appealing,” that counts. If it predicted “slightly appealing,” that doesn’t.
From purchase history alone, the twin had already built a picture of who this person is — their lifestyle, their category preferences, how price-sensitive they are, whether they gravitate toward new brands or stick with what they know. That behavioral profile turned out to be enough to predict how they would respond.
One panel member had 633 pet product purchases — cat food, treats, flea treatment, feeding bottles — and 74 distinct cleaning products, including a Shark CarpetXpert marketed as “Perfect for Pets,” multiple Dawn Powerwash variants, and ARM & HAMMER OxiClean.
The twin’s prediction: this person will find a pet mess cleaner very appealing, will agree it offers valuable features, and will say they want to buy it.
The human’s actual answers: “Very appealing.” Completely agreed on 6 of 7 value statements. “I would definitely buy this.”
Someone with 633 pet purchases and 74 cleaning products was always going to want a pet mess cleaner. The twin knew that from the purchase data alone.
Purchase data doesn’t just tell you what people buy. It tells you who they are. And who they are predicts how they will respond.
The Honest Limit
We want to be precise about where purchase data stops being useful.
The product concept test also showed consumers three different package designs — each with different imagery, colors, and copy emphasis — and asked which one they would most want to buy, and why.
One panel member had 1,819 Amazon purchases, 123 unique cleaning products, and 74 pet products. We could predict with confidence that they would find the product relevant. And they did.
But when asked to choose between the three package designs, they preferred the one that led with scientific language about the cleaning formula over the one that led with pet imagery. Their reasoning: “I know with this sort of product it will provide me a deep stain and odor removal… more clarity on product having a dirt lift tech.”
Nothing in 1,819 purchases tells you whether this person responds more to scientific claims or friendly branding. Purchase history reveals what someone buys. It does not reveal which shelf design will catch their eye.
This is the honest boundary. Purchase data tells you who to target. It cannot tell you what they want to see on the shelf. Visual design, packaging creative, and copy treatment require showing people the options and asking for their reaction.
But notice what happened. The questions that actually needed a human were about three package designs. Everything else — the targeting, the appeal prediction, the purchase intent — was already answered by the purchase data.
What This Means
We tested digital twins on two very different types of survey questions. On factual, behavioral questions — what car care activities do you perform? — the twin matched the human with 88.4% accuracy when it had relevant purchase data. On subjective opinion questions about a product the consumer had never used, it landed within one step of the human’s answer 86–93% of the time.
Where it stopped working was visual and creative preference — which package design do you prefer, and why? That requires showing people the work.
This is a small, directional study. But the implication is clear: a large portion of what surveys ask could be answered by purchase data before anyone fills out a form. The questions that genuinely need a human in the chair are fewer than most concept tests assume.
Digital twins don’t eliminate the survey. They eliminate the parts of the survey that were never worth asking a human in the first place.
The questions that remain become shorter, more focused, and more respectful of the consumer’s time. Better data in. Better answers out. Faster to market.