Validation Study

How Well Can a Digital Twin Answer a Survey?

If you know what someone has bought for the past three years — every toothbrush, every impulse buy, every 2 AM purchase they forgot about — can a digital twin built from that data answer a survey as well as the human?

We tested it. This is an early, directional study — a small panel, a single retailer — but the results were clear enough to share. A larger validation is underway.

The Setup

We took a panel of real consumers who had completed two surveys and connected their Amazon purchase accounts through Ario’s data connector.

The first survey was about car care behavior — 22 questions about what car care activities they personally perform, from washing the exterior to applying tire shine. Factual, behavioral questions with clear right-or-wrong answers.

The second was a product concept test for a new pet cleaning product they had never used — questions about appeal, purchase intent, and how much they agreed with statements like “this product offers valuable features” and “I want this product.” Subjective, opinion-based questions with no objectively correct answer.

For each person, we built a digital twin: a behavioral profile constructed entirely from their purchase history. What they buy. How much they spend. Which categories they favor. How often they repurchase. Whether they explore new brands or stick with what they know.

We didn’t configure the twin for either survey or tell it what was being tested. It answered based solely on purchase history. We scored every prediction against the actual human response.

Two surveys, two very different types of questions — and a single question: how well does the twin do?

When Data Is Present, Twins Are Scary Accurate

The first survey asked 22 specific questions: Do you personally wash your car? Vacuum the interior? Apply tire shine? Clean the leather? For each, consumers answered one of three things: “I do it myself,” “Someone else does it,” or “Haven’t done it.”

These are factual questions. The human either does the activity or they don’t. And when the twin has relevant purchase data to work with, it gets these right — 88.4% accuracy across all 22 activities.

Here is what that looks like for one real consumer.

The DIY Gearhead — 257 Amazon purchases, 48 automotive items

This consumer’s Amazon history reads like a garage build log: Fullway and Milestar tires, Wagner brake pads, Denso spark plugs, strut assemblies, a radiator fan kit, a professional diagnostic cable. Plus five dedicated car care products: Meguiar’s Endurance Tire Gel, Meguiar’s Ultimate Quik Wax, Armor All Headlight Wipes, and touch-up paint pens.

In the survey, he said: “I always wash and clean my car myself.”

Below is the side-by-side: what the human answered versus what the twin predicted, for each of the 22 car care activities.

Activity Human Twin Match
Wash the exteriorPersonallyPersonally
Spot clean the exteriorPersonallyPersonally
Remove bugs and/or tarPersonallyPersonally
Polish or remove scratchesPersonallyPersonally
Apply wax, sealant, or coatingsPersonallyPersonally
Apply trim protectant/restorerPersonallyPersonally
Polish the chrome/metal trimPersonallyPersonally
Clean/restore headlight lensesNot a priorityPersonally
Clean tough soils under the hoodPersonallyPersonally
Vacuum or pick up trashPersonallyPersonally
Clean carpet/upholstery/floor matsPersonallyPersonally
Clean/protect leather surfacesPersonallyPersonally
Wipe down dashboard/consolePersonallyPersonally
Apply interior protectantPersonallyPersonally
Use air freshener or deodorizerPersonallyPersonally
Clean tough soils/grimePersonallyPersonally
Clean inside of windowsPersonallyPersonally
Clean outside of windowsPersonallyPersonally
Clean rims/wheels/hubcapsPersonallyPersonally
Clean the tiresPersonallyPersonally
Apply tire shine or dressingPersonallyPersonally
Clean household surfacesNot a priorityPersonally

20 out of 22 correct. The twin read 48 automotive purchases — tires, brake pads, spark plugs, wax, tire gel — and inferred a DIY car enthusiast who does everything himself. It matched the human’s answers on 20 of 22 activities.

Purchase Data Predicts Opinions, Not Just Behavior

The second survey was a different kind of challenge. Consumers were shown a new pet cleaning product and asked to rate it on a five-point scale — how appealing it is, whether they would buy it, and how much they agreed with statements like “this product offers valuable features” and “I want this product.”

These are opinion questions. There is no objectively correct answer. And unlike the car care survey, the twin can’t just match products to activities — it has to infer how someone would feel about something.

We measured accuracy as “within one step” on the five-point scale. If the human said “very appealing” and the twin predicted “somewhat appealing” or “extremely appealing,” that counts. If it predicted “slightly appealing,” that doesn’t.

91% Product appeal — within one step of the actual answer
93% Agreement statements — within one step on a five-point scale
86% Purchase intent — within one step of the actual answer

From purchase history alone, the twin had already built a picture of who this person is — their lifestyle, their category preferences, how price-sensitive they are, whether they gravitate toward new brands or stick with what they know. That behavioral profile turned out to be enough to predict how they would respond.

The predictable pet parent

One panel member had 633 pet product purchases — cat food, treats, flea treatment, feeding bottles — and 74 distinct cleaning products, including a Shark CarpetXpert marketed as “Perfect for Pets,” multiple Dawn Powerwash variants, and ARM & HAMMER OxiClean.

The twin’s prediction: this person will find a pet mess cleaner very appealing, will agree it offers valuable features, and will say they want to buy it.

The human’s actual answers: “Very appealing.” Completely agreed on 6 of 7 value statements. “I would definitely buy this.”

Someone with 633 pet purchases and 74 cleaning products was always going to want a pet mess cleaner. The twin knew that from the purchase data alone.

Purchase data doesn’t just tell you what people buy. It tells you who they are. And who they are predicts how they will respond.

The Honest Limit

We want to be precise about where purchase data stops being useful.

The product concept test also showed consumers three different package designs — each with different imagery, colors, and copy emphasis — and asked which one they would most want to buy, and why.

The well-known stranger

One panel member had 1,819 Amazon purchases, 123 unique cleaning products, and 74 pet products. We could predict with confidence that they would find the product relevant. And they did.

But when asked to choose between the three package designs, they preferred the one that led with scientific language about the cleaning formula over the one that led with pet imagery. Their reasoning: “I know with this sort of product it will provide me a deep stain and odor removal… more clarity on product having a dirt lift tech.”

Nothing in 1,819 purchases tells you whether this person responds more to scientific claims or friendly branding. Purchase history reveals what someone buys. It does not reveal which shelf design will catch their eye.

This is the honest boundary. Purchase data tells you who to target. It cannot tell you what they want to see on the shelf. Visual design, packaging creative, and copy treatment require showing people the options and asking for their reaction.

But notice what happened. The questions that actually needed a human were about three package designs. Everything else — the targeting, the appeal prediction, the purchase intent — was already answered by the purchase data.

What This Means

We tested digital twins on two very different types of survey questions. On factual, behavioral questions — what car care activities do you perform? — the twin matched the human with 88.4% accuracy when it had relevant purchase data. On subjective opinion questions about a product the consumer had never used, it landed within one step of the human’s answer 86–93% of the time.

Where it stopped working was visual and creative preference — which package design do you prefer, and why? That requires showing people the work.

This is a small, directional study. But the implication is clear: a large portion of what surveys ask could be answered by purchase data before anyone fills out a form. The questions that genuinely need a human in the chair are fewer than most concept tests assume.

Digital twins don’t eliminate the survey. They eliminate the parts of the survey that were never worth asking a human in the first place.

The questions that remain become shorter, more focused, and more respectful of the consumer’s time. Better data in. Better answers out. Faster to market.