Alexa Cook Along — Patricia Panqueva

Overview

The Kitchen Is the Hardest Room to Design For

Cooking is one of the most cognitively and physically demanding domestic tasks. Hands are wet, floury, or covered in raw protein. A recipe on a phone becomes a liability — touched once, it darkens. Scrolled to the wrong step, you lose your place mid-chop.

When Alexa launched, voice-only instruction solved the hands problem — but created new ones. A single spoken step with no visual anchor is easy to mishear. And "Alexa, repeat step three" became the accidental UX benchmark nobody wanted.

Alexa Cook Along was the answer: a voice-first, visually-anchored cooking experience designed from the ground up as a multimodal product — before the word "multimodal" was in every product brief.

The Scenario

Meet Kai

To ground the design, we built a core scenario: Kai, someone who wants to bake cookies with hands already deep in dough — a perfect stress test for a hands-free, screen-free cooking experience.

CX Storyboard · Alexa guides Kai through baking chocolate cookies click to enlarge

In-House User Testing

Testing Where the Mess Happens

We didn't test Cook Along in a sterile lab. The cocktail-making sessions at Amazon Santa Clara were a deliberate stress test: high ingredient complexity, precise measurements, time pressure.

Session Setup

Real ingredients, real complexity — no sterile lab conditions.

Prototype Test

Utterance variants PVT 1.0, OBA-2, OBA-3 tested side-by-side.

Recipe Selection

Testing whether users default to touch or voice.

Live session: Echo Show displaying step-by-step cocktail instructions. Real device, real recipe, real user.

"When users had wet or occupied hands, they never reached for the screen — even when the voice instruction was unclear. The voice experience had to be fully self-sufficient."

Research & Insights

Six Principles That Shaped Every Design Decision

01

Frequent ingredient references. Users constantly glance back at the ingredient list — persistent access to quantities at every step was essential.

02

Continuous visual patterns. Consistent layout keeps users immersed. Surprising them with UI changes at step 5 breaks flow.

03

Education on navigation. Users didn't know how to move through a recipe by voice. Discoverable utterances at the right moments were essential.

04

Contextual conversations. Voice responses had to match what was on screen. Coherence across channels is non-negotiable.

05

Memorable content. Voice output needed to be structured and easy to hold in short-term memory while hands are busy.

06

Highlighted numbers & measurements. Quantities are the most critical and most misheard information. Visual emphasis reduced errors significantly.

Interaction Design

Voice + Visual: Step by Step

Each step has a corresponding visual state on Echo Spot and a spoken Alexa response. Core principle: the screen shows the current fact, the voice explains the current action.

Echo Spot — Step-by-step cooking flow · press play to hear Alexa

Step 1

Alexa says

Step 2

Alexa says

Step 3

Alexa says

Step 1 shows temperature while Alexa awaits "Next". Step 2 reads and displays ingredients. Step 3 surfaces the action with voice direction.

Product Demo

See It in Action

A working demo of Cook Along on Echo Spot — voice-guided, step-by-step cooking in a real kitchen context.

Alexa Cook Along demo — hands never touch the screen.

Voice Design

The Golden Utterances

Designing for voice means designing what people will actually say — not what the system expects.

Golden Utterances — what users actually say

"Alexa, what should I cook tonight?" "Alexa, next" "Alexa, step one" "Alexa, how many grams?" "Alexa, how long in the oven?" "Alexa, repeat" "Alexa, pause" "Alexa, go back"

"Cook Along wasn't a recipe reader. It was a cooking partner — something that responds when spoken to, stays quiet when not needed, and always knows where you are."

Outcome & Impact

What Shipped — and What It Established

Impact

Cook Along shipped to 20M+ Alexa households across Echo Show and Echo Spot at first release in 2018.

Cook Along established the multimodal interaction model for sequential tasks on Alexa. The patterns — voice-led navigation, step-anchored visuals, cross-surface consistency — became the foundation subsequent Alexa experiences were built on.

It was an early proof of concept for ambient AI assistance: technology that doesn't demand your attention, but responds reliably when you need it.

The most meaningful outcome: the shift in how the team thought about Alexa. Before Cook Along, Alexa was a question-and-answer machine. After it, Alexa was a presence in the home.

Reflection

What I'd Do Differently

I would push for longitudinal research earlier. Our in-kitchen sessions were invaluable but point-in-time. Tracking households over six weeks would have surfaced insights about habit formation after novelty wore off.

I'd also design failure states as first-class artifacts from day one — the moments Alexa misheard a quantity, or a user lost their place after a 20-minute pause, deserved dedicated attention earlier in the process.

AlexaCook Along

The Kitchen Is the Hardest Room to Design For

Meet Kai

Testing Where the Mess Happens

Six Principles That Shaped Every Design Decision

Voice + Visual: Step by Step

See It in Action

The Golden Utterances

What Shipped — and What It Established

What I'd Do Differently

Alexa
Cook Along