“Lying” in Computer-Generated Texts: Hallucinations and Omissions

by Oxford University Press
Sep 01, 2023 | Filed in Academic Libraries

There is huge excitement about ChatGPT and other large generative language models that produce fluent and human-like texts in English and other human languages. But these models have one big drawback, which is that their texts can be factually incorrect (hallucination) and also leave out key information (omission).

Featured image by Google DeepMind Via Unsplash (public domain)

By Kees Van Deemter and Ehud Reiter

In our chapter for The Oxford Handbook of Lying, we look at hallucinations, omissions, and other aspects of “lying” in computer-generated texts. We conclude that these problems are probably inevitable.

Omissions are inevitable because a computer system cannot cram all possibly-relevant information into a text that is short enough to be actually read. In the context of summarising medical information for doctors, for example, the computer system has access to a huge amount of patient data, but it does not know (and arguably cannot know) what will be most relevant to doctors.

Hallucinations are inevitable because of flaws in computer systems, regardless of the type of system. Systems which are explicitly programmed will suffer from software bugs (like all software systems). Systems which are trained on data, such as ChatGPT and other systems in the Deep Learning tradition, “hallucinate” even more. This happens for a variety of reasons. Perhaps most obviously, these systems suffer from flawed data (e.g., any system which learns from the Internet will be exposed to a lot of false information about vaccines, conspiracy theories, etc.). And even if a data-oriented system could be trained solely on bona fide texts that contain no falsehoods, its reliance on probabilistic methods will mean that word combinations that are very common on the Internet may also be produced in situations where they result in false information.

Suppose, for example, on the Internet, the word “coughing” is often followed by “… and sneezing.” Then a patient may be described falsely, by a data-oriented system, as “coughing and sneezing” in situations where they cough without sneezing. Problems of this kind are an important focus for researchers working on generative language models. Where this research will lead us is still uncertain; the best one can say is that we can try to reduce the impact of these issues, but we have no idea how to completely eliminate them.

“Large generative language models’ texts can be factually incorrect (hallucination) and leave out key information (omission).”

The above focuses on unintentional-but-unavoidable problems. There are also cases where a computer system arguably should hallucinate or omit information. An obvious example is generating marketing material, where omitting negative information about a product is expected. A more subtle example, which we have seen in our own work, is when information is potentially harmful and it is in users’ best interests to hide or distort it. For example, if a computer system is summarising information about sick babies for friends and family members, it probably should not tell an elderly grandmother with a heart condition that the baby may die, since this could trigger a heart attack.

Now that the factual accuracy of computer-generated text draws so much attention from society as a whole, the research community is starting to realize more clearly than before that we only have a limited understanding of what it means to speak the truth. In particular, we do not know how to measure the extent of (un)truthfulness in a given text.

To see what we mean, suppose two different language models answer a user’s question in two different ways, by generating two different answer texts. To compare these systems’ performance, we would need a “score card” that allowed us to objectively score the two texts as regards their factual correctness, using a variety of rubrics. Such a score card would allow us to record how often each type of error occurs in a given text, and aggregate the result into an overall truthfulness score for that text. Of particular importance would be the weighing of errors: large errors (e.g., a temperature reading that is very far from the actual temperature) should weigh more heavily than small ones, key facts should weigh more heavily than side issues, and errors that are genuinely misleading should weigh more heavily than typos that readers can correct by themselves. Essentially, the score card would work like a fair school teacher who marks pupils’ papers.

We have developed protocols for human evaluators to find factual errors in generated texts, as have other researchers, but we cannot yet create a score card as described above because we cannot assess the impact of individual errors.

What is needed, we believe, is a new strand of linguistically informed research, to tease out all the different parameters of “lying” in a manner that can inform the above-mentioned score cards, and that may one day be implemented into a reliable fact-checking protocol or algorithm. Until that time, those of us who are trying to assess the truthfulness of ChatGPT will be groping in the dark.

Professor Kees van Deemter, Utrecht University Dept. of Information and Computing Sciences. Kees works in Computational Linguistics, a research area that belongs to both Artificial Intelligence and Cognitive Science. His main area of expertise is Natural Language Generation (NLG).

Ehud Reiter is Professor and Chair in Computing Science at the University of Aberdeen, and is Chief Scientist of ARRIA NLG, a global leader in the field of artificial intelligence (AI) known as natural language generation (NLG).

SPONSORED BY

Add Comment :-

0 COMMENTS

Comment Policy:

Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
Don't use obscene, profane, or vulgar language.
Stay on point. Comments that stray from the topic at hand may be deleted.
Comments may be republished in print, online, or other forms of media.
If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.

Fill out the form or Login / Register to comment:

(All fields required)

First Name should not be empty !!!

Last Name should not be empty !!!

email should not be empty !!!

Comment should not be empty !!!

Please check the reCaptcha

CONTINUE READING?

Non - Subscribers

Subscribers

BUDGETS & FUNDING

Learning from the Past | Periodicals Price Survey 2025

by Siôn Romaine, Barbara Albee, Cynthia M. Elliott, and Stephen Bosch

ACADEMIC LIBRARIES

This Comprehensive Database Has Had a Huge Impact on Eighteenth Century Research—and It’s About to Get Even Better

by Gale, part of Cengage Group

Library Patrons Need Agile Resources for Learning Professional Skills. Gale Has a Perfect Solution.

by Gale, part of Cengage Group

Academic Movers Q&A: Allison Jennings-Roche on Information Systems and Their Impact

by Amy Rea

NEWS

Coordinating Data Rescue Efforts: Q&A with Lynda Kellam

by Lisa Peet

ACADEMIC LIBRARIES

New Resources Help Students Deepen Their Understanding of Literature with Primary Documents

by Gale, part of Cengage Group

Run Your Week: Big Books, Sure Bets & Titles Making News | July 17 2018

Neal Wyatt Jul 17, 2018

The Other Woman by Daniel Silva leads holds this week. Former President Obama has more summer reading. Downton Abbey is heading to the movies.

TECHNOLOGY

Materials on Hand | Materials Handling

Matt Enis, May 16, 2018

Automated systems are helping libraries move staff to patron-facing work, while manufacturers innovate new design features.

PROGRAMS+

LGBTQ Collection Donated to Vancouver Archives

Lisa Peet, Jun 21, 2018

Longtime archivist, former head of the Vancouver Public Library’s history division, and queer rights activist Ron Dutton donated more than 750,000 items documenting the British Columbia LGBTQ community to the City of Vancouver Archives in March.

ALREADY A SUBSCRIBER? LOG IN

We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing

“Lying” in Computer-Generated Texts: Hallucinations and Omissions

Featured image by Google DeepMind Via Unsplash (public domain)

By Kees Van Deemter and Ehud Reiter

Get Print. Get Digital. Get Both!

Add Comment :-

Comment Policy:

CONTINUE READING?

Added To Cart

RELATED

Learning from the Past | Periodicals Price Survey 2025

This Comprehensive Database Has Had a Huge Impact on Eighteenth Century Research—and It’s About to Get Even Better

Library Patrons Need Agile Resources for Learning Professional Skills. Gale Has a Perfect Solution.

Academic Movers Q&A: Allison Jennings-Roche on Information Systems and Their Impact

Coordinating Data Rescue Efforts: Q&A with Lynda Kellam

New Resources Help Students Deepen Their Understanding of Literature with Primary Documents

Run Your Week: Big Books, Sure Bets & Titles Making News | July 17 2018

Materials on Hand | Materials Handling

LGBTQ Collection Donated to Vancouver Archives

Log In

REGISTER FREE to keep reading

If you are already a member, please Log In

Success.

Create a Password to complete your registration. Get access to:

ALREADY A SUBSCRIBER? LOG IN

ALREADY A SUBSCRIBER? LOG IN

Thank you for visiting.

SUBSCRIPTION OPTIONS

Already a subscriber? Log In

Thank you for visiting.

Already a subscriber? Log In

Already a subscriber? Log In