AI Scribes For Doctors Are Everywhere. Here's What The Research Says

Jesse Pines
Apr 7
5 min read

Updated: Apr 8

Jesse M. Pines, MD, MBA

If you haven't noticed, there may be a new guest in your doctor's exam room. It's their artificial intelligence scribe.

AI scribes are tools that listen to clinician patient conversations and automatically generate draft clinical notes. The promise to physicians is enormous: to tackle two problems at once, clerical burnout and administrative overload. For patients, the upside is simple: more attentive doctors who are less tethered to their keyboards.

Many AI scribe companies are being used in hospitals with such names as DAX (Microsoft), Ambience and Abridge. Some companies focus on specialty-specific scribing like Cleo, which is used primarily in acute care settings like emergency departments.

As we hurtle toward a future where more clinical conversations are recorded and summarized by software, it becomes critical to understand how these tools perform, who they help and where they fall short.

Recent studies published in the literature on AI scribes paint a picture of consistent benefits. Yet they also reveal challenges with accuracy and some significant risks. Here's the current state of the research.

AI Scribes Can Increase Efficiency

The core value proposition of AI scribes is giving clinicians back their time. A large, propensity score matched study published in JAMA Network Open found that clinicians using AI scribes spent 9% less time in the electronic health record overall, about 2.4 minutes per appointment and 16% less time writing notes.

Those minutes can add up across a full clinic day and translate into meaningful reductions in cognitive load.

Yet the technology has not yet fully erased after-hours work, at least in that particular study. The same paper found no significant change in after-hours documentation time or how quickly encounters were closed.

This suggests that while AI scribes may streamline note creation, they may not yet solve the broader problem of inbox management and other tasks that keep clinicians tethered to their computers long after patients have left the building.

A Nuanced Burnout Story With AI Scribes

If AI scribes save time, they should also reduce burnout. Much of the evidence supports that, but with important caveats. A multi-center quality improvement study showed a drop in burnout among 263 clinicians from 52% to 39% after just 30 days of using an AI scribe.

But the benefits have not been uniform. One study found that while physicians reported reduced burnout and lower intent to leave their jobs, advanced practice providers did not show the same improvements.

That gap warrants further investigation. It could reflect differences in workflow, the types of visits, the complexity of patients, or how different roles interact with the technology.

The takeaway from these early studies is emerging: implementing a tool is not enough. Health systems also need to understand how it fits into the diverse roles within a care team.

Additionally, studies have shown varied rates of adoption by physicians, with some finding that after having been offered the program 1 in 10 ultimately adopt it. Other studies have found adoption rates to be significantly higher, exceeding 50% or more.

AI Scribes Are Accurate, But Sometimes Omit Key Patient Details

When it comes to accuracy, the story is complicated. In controlled simulations that compare AI documentation to human documentation, the best AI scribes sometimes outperform clinicians, including in studies of general practice style consultations.

But when researchers examine the error profiles, a significant safety concern emerges: omissions. Across platforms and studies, omission errors account for roughly 71% to 83% of all errors. Many omissions are clinically minor, but a concerning number are not.

One study found that 45% of omissions carried moderate clinical significance. Another analysis of hospital discharge summaries found that AI generated narratives contained significantly more omissions than physician generated ones, with a mean of 1.75 versus 0.86 omissions.

This creates a paradox. An AI scribe can produce a note that is more thorough and better organized than a physician's hurried version. Yet it may quietly drop a critical detail about disease severity, a comorbid condition or a nuance in the differential diagnosis or management plan.

Variability also extends to the vendors themselves. A comparative analysis of four commercial platforms found large differences. One vendor had a low overall error rate around 12%, while another had an omission rate of 25%. Some produced overly long notes with hallucinated details, while others were too brief.

A key issue is that not all AI scribes are created equal.

AI Scribes Present Challenges in Specific Patient Populations

A pressing concern raised by the research is the potential for AI scribes to widen healthcare disparities. The technology tends to work best for routine or protocol driven visits. But what about complex patients, non English speakers, or those with cognitive impairments?

The evidence here is troubling. Many AI scribes have limited functionality for non-English speaking patients, which is a major barrier to adoption. There are also concerns about accuracy with accented speech.

While AI has the potential to bridge language gaps, current implementations may inadvertently disadvantage patients with limited English proficiency.

Bias in AI models trained on non diverse datasets can also perpetuate and amplify existing disparities. Women, people from underrepresented racial and ethnic groups, and those from lower socioeconomic backgrounds are often underrepresented in training data. That can lead to misclassification, missed context or a failure to capture crucial social determinants of health.

There is also a risk of automation bias, where clinicians become less vigilant because the note feels complete. One study found that while AI scribes increased documentation of neuropsychiatric symptoms, it was associated with fewer psychiatric interventions.

Finally, trust and transparency matter. Studies have found that while many patients were comfortable with AI scribes when clinicians reviewed the output, a large percentage were reluctant to have an AI scribe used in a future visit. Key concerns included privacy and a lack of awareness that the technology was being used at all.

What Safe and Equitable AI Scribe Rollout Might Look Like

AI scribes are a transformative technology and one of the early success stories for healthcare AI. They have clear potential to reduce documentation burden and improve clinician well being. But they are not yet a plug and play solution.

The evidence points to several imperatives for a safe and equitable rollout. First, having a clinician-in-the-loop is vital for every note. AI generated notes require vigilant human oversight, especially to catch omission errors.

Second, vendor selection is a major determinant of note quality. Health systems should evaluate scribes based not only on overall quality scores, but also on the nature and clinical significance of their errors, with attention to omissions.

Third, equity has to be designed in, not bolted on later. Programs should validate performance across languages, accents and diverse patient populations. They should test performance in complex visits.

Finally, patient transparency is essential. Patients should be informed when AI is listening, understand how recordings are handled and be able to raise privacy concerns.

Ultimately, AI scribes are here to stay. The next phase of adoption should be judged less by how much time is saved, but by whether notes are accurate, equitable and trustworthy.

AI Scribes For Doctors Are Everywhere. Here's What The Research Says

Recent Posts

Comments