AI as a counselor? Study shows that chatbots agree too much

Chatbots (virtual assistants with artificial intelligence) and other language models tend to agree too much when giving personal advice, offering excessive validation and potentially harming the way internet users deal with conflicts and moral decisions.

The conclusion comes from a study conducted by researchers at Stanford University, in the United States, and published in the journal Science, which evaluated artificial intelligence in interpersonal counseling scenarios.

The researchers tested the models with three types of stimuli: questions about personal conflicts, thousands of Reddit posts from the “AmITheAsshole” community (am I the idiot?) — where users describe everyday discussions and ask the community to judge who is right —, and descriptions of potentially harmful or illegal actions.

In all situations, language models endorsed the user’s position much more often than humans: on average 49% more often in cases of general advice and Reddit prompts, and they even endorsed problematic behavior in 47% of dangerous scenarios.

According to research, they often approve potentially illegal actions (such as ways to bend rules or commit crimes), minimize the need to apologize or repair harm in interpersonal conflicts, validate abusive or manipulative behaviors (for example, gaslighting — a form of psychological abuse in which someone manipulates another person to make them doubt their own perception or memory), and justify decisions that put health or safety at risk.

These patterns appeared most strongly in prompts taken from “AmITheAsshole” and in explicit harm scenarios, where models agreed with the user’s position much more than human adjudicators.

Two models

The authors also tested the effect of this advice on real people: more than 2,400 participants spoke with sycophantic versions (sycophantic) and non-flattery of the models.

The results show that the interlocutors preferred the flattering AIs, considered their responses more trustworthy and began to feel more certain of their positions after the interaction.

According to the Stanford University team led by lead author Myra Cheng and co-author Dan Jurafsky, participants who spoke to the sycophantic versions of the models became less likely to apologize or repair harm — a possible sign of increased moral rigidity and .

Another troubling finding: Participants were not able to reliably distinguish when the model was being overly condescending, because responses tend to use academic and neutral language even when endorsing.

“Flattery is a security problem”, says professor Dan Jurafsky, co-author of the study, in the exhibition, who warns of the need for regulation and stricter standards to avoid the proliferation of morally unsafe models.

How to avoid flattery

The team is now exploring ways to mitigate this trend. They have found that they can modify the models to reduce sycophancy. Surprisingly, even instructing a model to begin its output with the words “wait a minute” makes it more critical.

Still, the authors recommend caution: for now, it is not advisable to use AIs as a substitute for conversations with real people in complex interpersonal issues.

The study was funded by the National Science Foundation and involved researchers including Myra Cheng (lead author), Cinoo Lee, Sunny Yu, Dyllan Han and Professor Dan Jurafsky; Pranav Khadpe, from Carnegie Mellon, is also among the co-authors.

source