bigrus

AI-powered transcription tool in hospitals is inventing something

Harry October 27, 2024

Tech giant OpenAI claimed that its AI-powered transcription tool Whisper has “near human-level robustness and accuracy.”

But Whisper has one major flaw: It’s prone to rendering snippets of text or even entire sentences, according to interviews with more than a dozen software engineers, developers, and academic researchers. Some of the invented texts, known in the industry as hallucinations, can include racist comments, violent rhetoric and even imaginary medical treatments, these experts said.

Experts said such fabrications are problematic because Whisper is used in numerous industries around the world to translate and transcribe interviews, generate text in popular consumer technologies and create captions for videos.

More worrying, they said, was medical centers’ rush to use Whisper-based tools to transcribe patients’ conversations with doctors despite OpenAI’s warnings that the tool should not be used in “high-risk areas.”

The full problem is difficult to understand, but researchers and engineers said they frequently encountered Whisper’s hallucinations in their studies. For example, a University of Michigan researcher who conducted a study of public meetings said he found hallucinations in eight out of 10 audio transcriptions he examined before he began developing the model.

One machine learning engineer said he initially discovered hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper.

Problems persist even with well-recorded short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio tracks they examined.

This trend would lead to tens of thousands of incorrect transcriptions across millions of records, the researchers said.

Alondra Nelson, who led the White House Office of Science and Technology Policy on behalf of the Biden administration until last year, said such mistakes “can have really serious consequences,” especially in hospital settings.

“Nobody wants a misdiagnosis,” said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There has to be a higher bar.”

Whispering is also used to create closed captioning for the Deaf and hard of hearing, a group that is particularly at risk of inaccurate transcription. That’s because deaf and hard-of-hearing people have no way of detecting fabrications “that are hidden in all these texts,” said Christian Vogler, who is deaf and directs Gallaudet University’s Technology Access Program.

OpenAI urged to solve problem

The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call on the federal government to consider AI regulations. At the very least, they said, OpenAI should fix this flaw.

“This seems solvable if the company wants to prioritize this,” said William Saunders, a San Francisco-based research engineer who left OpenAI in February over concerns about the company’s direction. “If you put that out there and people become overconfident about what it can do and integrate it into other systems, it becomes problematic.”

An OpenAI spokesperson said the company is constantly researching how to reduce hallucinations and appreciates the researchers’ findings, adding that OpenAI incorporates feedback into model updates.

While most developers assume transcription tools misspell words or make other errors, engineers and researchers said they’ve never seen another AI-powered transcription tool hallucinate as much as Whisper.

Whisper hallucinations

The tool is integrated into some versions of OpenAI’s flagship chatbot, ChatGPT, and is a built-in offering on cloud computing platforms from Oracle and Microsoft that serve thousands of companies worldwide. It is also used to transcribe text and translate it into multiple languages.

Last month alone, a new version of Whisper was downloaded more than 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine learning engineer there, said Whisper is the most popular open-source speech recognition model and is integrated into everything from call centers to voice assistants.

Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that approximately 40% of hallucinations were harmful or alarming because the speaker could be misinterpreted or misrepresented.

In one example they uncovered, a speaker said: “That kid, I’m not exactly sure, was going to take the umbrella.”

But the transcription software added: “He took a big piece of a cross, a teeny tiny piece… I’m sure he didn’t have a terrorist knife, so he killed a lot of people.”

The speaker on another recording described it as “two girls and one more lady.” Whisper invented extra comments about race, adding “two more girls and a lady were, um, black.”

In a third transcript, Whisper invented a non-existent drug called “hyperactive antibiotics.”

Researchers don’t know why Whisper and similar tools hallucinate, but software developers say the hallucinations often occur during pauses, background noise, or while music is playing.

In its online remarks, OpenAI recommended that Whisper not be used in “decision-making contexts where flaws in accuracy could lead to significant flaws in results.”

Writing doctor appointments

That warning hasn’t stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what is said during doctor visits so medical providers can spend less time taking notes or writing reports.

More than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, have begun using a Whisper-based tool developed by Nabla, which has offices in France and the United States.

Martin Raison, Nabla’s chief technology officer, said the tool is finely tuned to medical language to transcribe and summarize patients’ interactions.

Company officials said they were aware that Whisper could hallucinate and were trying to find a solution to the problem.

Raison said it was impossible to compare Nabla’s AI-generated transcription to the original recording because Nabla’s tool deleted the original audio “for data security reasons.”

The tool has been used to transcribe an estimated 7 million medical visits, Nabla said.

Former OpenAI engineer Saunders said deleting original audio could be concerning if transcripts aren’t double-checked or if clinicians can’t access the recording to verify its accuracy.

“If you take away the ground truth, you can’t catch errors,” he said.

No model is perfect, Nabla said, and theirs currently requires medical providers to quickly compile and approve written notes, but that could change.

privacy concerns

Because patients’ conversations with their doctors are confidential, it is difficult to know how AI-generated transcripts affect them.

California state lawmaker Rebecca Bauer-Kahan said she took one of her children to the doctor earlier this year and refused to sign the health network form if he got her permission to share the consultation audio with vendors including Microsoft Azure. Cloud computing system operated by OpenAI’s largest investor. Bauer-Kahan said he did not want such intimate medical conversations to be shared with technology companies.

“The release was very specific that for-profit companies would have the right to have this,” said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the State Assembly. “I said, ‘Absolutely not.’ ”

John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

This story was produced in partnership with the Pulitzer Center’s AI Accountability Network, which also supported the academic Whisper study in part. AP also receives financial assistance from the Omidyar Network to support the scope of AI and its impact on society.

Semainede4jours

Semainede4jours

AI-powered transcription tool in hospitals is inventing something

OpenAI urged to solve problem

Whisper hallucinations

Writing doctor appointments

privacy concerns

Harry

London’s best new generation underground cocktail bars

Follow live: No. 1 Duke vs. No. 2 Wake Forest women’s soccer

Gaetz Will Burn Down the Justice Department

At Jonbon Shloer when irrigation begins

AI-powered transcription tool in hospitals is inventing something

OpenAI urged to solve problem

Whisper hallucinations

Writing doctor appointments

privacy concerns

Harry

You Might Also Like

London’s best new generation underground cocktail bars

Follow live: No. 1 Duke vs. No. 2 Wake Forest women’s soccer

Gaetz Will Burn Down the Justice Department

At Jonbon Shloer when irrigation begins