I Spent Time with Google's Diagnostic AI in a Real Clinic. Here's What I Found.

I’ve been following Google’s AMIE project since they first showed it off in simulated settings. Back then, it felt like a cool demo—an AI that could talk to patient actors and help doctors with diagnostic puzzles. But simulations are neat and tidy. Real clinics are messy. Patients forget things, get nervous, or just don’t explain their symptoms well.

Now, Google Research and Beth Israel Deaconess Medical Center (BIDMC) have published the first prospective clinical feasibility study of AMIE. It’s not a full rollout, but it’s a real step: patients with new, non-emergency complaints used AMIE to do a pre-visit history-taking before seeing their primary care doctor. And an actual physician was watching the whole thing over a live video feed, ready to step in if the AI went off the rails.

Let me be clear: this is not “AI replaces doctor.” This is “AI does the paperwork before you see the doctor.” And that’s a much more realistic and useful starting point.

How the Study Worked

The setup was straightforward but smart. Patients booked for new, non-emergency appointments—either in-person or telehealth—were invited to chat with AMIE via a secure web link before their actual visit. The AI conducted a text-based conversation to gather their medical history and current symptoms. A physician (the “AI supervisor”) watched via a live screen-share and could intervene based on predefined safety criteria.

This oversight layer is critical. In clinical practice, trainees often take histories under supervision. AMIE was treated the same way. The AI generated a transcript and summary, which (with patient consent) was handed to the actual clinician before the face-to-face appointment. The idea is that the doctor walks in already knowing the patient’s story, saving time and reducing cognitive load.

What Worked

I’ll give credit where it’s due: the conversational flow felt natural in the examples I saw. AMIE didn’t just fire off a checklist of questions. It asked follow-ups, clarified vague answers, and seemed to understand context. That’s harder than it sounds. Most medical chatbots I’ve tested either sound like a robotic intake form or get confused when you deviate from their script.

Patients reported feeling heard, and the summaries were structured well enough that clinicians could quickly grasp the key points. For a primary care setting where doctors are constantly pressed for time, that’s a real win.

Where I’m Skeptical

But let’s not get carried away. This was a single-center, single-arm feasibility study. That means no control group, no randomization, and a relatively small sample. The paper is transparent about these limitations, but the hype machine will likely ignore them.

More importantly, the AI was supervised the entire time. That’s not a criticism—it’s the right thing to do for safety. But it means we don’t know how AMIE would perform in a less controlled environment. What happens when the supervising physician is distracted, or when the AI encounters a patient with complex comorbidities that don’t fit its training data?

Also, the study only covered new, non-emergency episodic complaints. That’s a narrow slice of primary care. Chronic conditions, mental health issues, or anything requiring physical examination are completely out of scope for now.

My Take

This is a solid proof of concept. The fact that Google and BIDMC went through IRB approval, pre-registered the study, and built in real-time physician oversight tells me they’re taking safety seriously—unlike some AI health startups that seem to skip straight to deployment.

But we’re years away from this being standard practice. The next steps need to include larger trials, comparison with standard care, and—most importantly—studies that test the AI with less direct supervision. I’d also like to see data on diagnostic accuracy compared to human-only history-taking, not just feasibility metrics.

For now, I’m cautiously optimistic. AMIE shows that conversational AI can work in a real clinical workflow without causing harm. That’s more than most “AI doctor” demos can claim. But the gap between “feasible” and “better than current practice” is still wide. Let’s see if Google can bridge it.

I Spent Time with Google’s Diagnostic AI in a Real Clinic. Here’s What I Found.

How the Study Worked

What Worked

Where I’m Skeptical

My Take

Comments (0)