
The collective behavior of birds in flocks could be the solution to make artificial intelligence more reliable in producing long summaries.
The suggestion comes from a new study in Frontiers in Artificial Intelligence, which describes an algorithmic framework inspired by the way birds organize themselves to filter information before it is processed by large language models.
The researchers’ objective was to find a way to circumvent AI calls, which occur mainly when they analyze long, repetitive documents or with a lot of noise in the mix. In addition to spreading falsehoods, this misinformation requires time-consuming revisions, which often defeat the purpose of using a chatbot.
Anasse Bariprofessor of computer science at New York University, developed with his team a method that he believes is simple: instead of delivering an entire document to the system, the algorithm first identifies and organizes the most relevant sentences, immediately reducing redundancies and highlighting the essential points.
The idea is: each sentence is treated as if it were a “virtual bird”. First, the system cleans the text, preserving mainly nouns, verbs and adjectives, and transforms each sentence into a numeric vector based on lexical, semantic and thematic characteristics. It then assigns scores according to the importance in the document, the weight of the section in which it appears and its proximity to the summary or central idea, giving even greater relevance to parts such as the introduction, results and conclusion, explains .
Only in the third phase does flock flying come into play.
Selecting only the phrases with the highest scores could lead to repetitions on the same topic, so the system groups semantically close phrases into “bunches”. Within each group, the most representative phrases are chosen.
According to the authors of the research, the process allows the construction of a more balanced set of sentences. The condensed material is then delivered to the language model, which produces a final summary that is clearer and more faithful to the source.
The method was tested on more than 9,000 documents. According to the researchers, it generated summaries with greater factual accuracy than those produced by language models alone.