As LLMs Grow Bigger, they’re more likely to give Wrong Answers than Admit Ignorance

Spread the love
As LLMs grow bigger, they're more likely to give wrong answers than admit ignorance
Performance of a selection of GPT and LLaMA models with increasing difficulty. Credit: Nature (2024). DOI: 10.1038/s41586-024-07930-y

A team of AI researchers at Universitat Politècnica de València, in Spain, has found that as popular LLMs (Large Language Models) grow larger and more sophisticated, they become less likely to admit to a user that they do not know an answer.

In their study published in the journal Nature, the group tested the latest version of three of the most popular AI chatbots regarding their responses, accuracy, and how good users are at spotting wrong answers.

As LLMs have become mainstream, users have become accustomed to using them for writing papers, poems or songs and solving math problems and other tasks, and the issue of accuracy has become a bigger issue. In this new study, the researchers wondered if the most popular LLMs are getting more accurate with each new update and what they do when they are wrong.

To test the accuracy of three of the most popular LLMs, BLOOM, LLaMA and GPT, the group prompted them with thousands of questions and compared the answers they received with the responses of earlier versions to the same questions.

They also varied the themes, including math, science, anagrams and geography, and the ability of the LLMs to generate text or perform actions such as ordering a list. For all the questions, they first assigned a degree of difficulty.

They found that with each new iteration of a chatbot, accuracy improved in general. They also found that as the questions grew more difficult, accuracy decreased, as expected. But they also found that as the LLMs grew larger and more sophisticated, they tended to be less open about their own ability to answer a question correctly.

In earlier versions, most of the LLMs would respond by telling users they could not find the answers or needed more information. In the newer versions, the LLMs were more likely to guess, leading to more answers in general, both correct and incorrect. They also found that all the LLMs occasionally produced incorrect responses even to easy questions, suggesting that they are still not reliable.

The research team then asked volunteers to rate the answers from the first part of the study as being either correct or incorrect and found that most had difficulty spotting incorrect answers. https://techxplore.com/news/2024-09-llms-bigger-theyre-wrong.html