Final version will be launched in 2026. Linguistic variant, cultural representation and data protection, in a 19-month project.
The great language model (LLM) artificial intelligence (IA) Portuguese will be called Amália and its final version will be released in 2026, advances, in an interview with Lusa, the executive president (CEO) of the Center for Responsible AI.
On November 11, on the opening night of the Web Summit, the Prime Minister launched, in the first quarter of next year, an LLM (‘Large Language Model’) in Portuguese.
The project involves the Center for Responsible AI, of which Paulo Dimas is CEO, and the research centers Nova FCT and Instituto Superior Técnico.
The first version “it won’t be a perfect version”but rather “beta, initial, to start getting feedback and, over time, it will be improved”, says Paulo Dimas, adding that it is “a project for 19 months“.
The final version “will only be released in 2026”, he adds.
The three fundamental points of this project are the linguistic variant – Portuguese from Portugal -, the cultural representation ea data protectionpoints out.
Paulo Dimas highlights that, as Luís Montenegro stated, it will be ready “in first quarter” of 2025.
“We will be working on work already developed by these research centers: therefore, there is work of several years in this area, both in the area of data for the Portuguese language, work done by the research center of the Nova Faculdade de Ciências e Tecnologia (FCT), there is work also done within the Técnico” and “there is also work that will be transferred from the from Unbabel, for all the experience” that the technology company “has creating multilingual models and models that are currently being trained on supercomputers”, he says.
In short, “the team that will be working on the creation of this LLM is a team that already has many years of experience in this area”, highlights Paulo Dimas.
On top of this work, “it is possible to deliver this LLM in the first quarter” and “to this is added a very close collaboration with the Foundation for Science and Technology, which created conditions at the computing level”, essential for this type of large-scale models. scale.
“And the Foundation for Science and Technology has been investing in computational capacity that will be used here”, since “in practice let’s use (…) a computer that is in Barcelona, but part of it is Portuguese“, he continues.
In other words, “we have a Portuguese computer that is physically in Barcelona, but a percentage is from the Portuguese State“, he summarizes.
Now, if “we were training this, for example, in a ‘cloud’ from Microsoft, Google, Amazon, this would have a very high cost, but as we will be using this national resource, it will be done in a much more efficient way. efficient from the point of view financial”, explains.
Asked what the Portuguese LLM represents for him, Paulo Dimas classifies it as “key player in the national artificial intelligence ecosystem“.
This is because “on top of this LLM it will be possible to create new artificial intelligence applications where the Portuguese language is preservedwhere we have control over the Portuguese language”, he highlights.
Paulo Dimas, who is also vice-president of innovation at Unbabel, gives the example of a product that he considers one of the “most emotional” he has ever developed in his professional life, the Halo.
Developed by the Unbabel team, this project allows “to recover the communication capacity of patients suffering from Amyotrophic Lateral Sclerosis [ELA]”, as they lose the ability to write and speak because they have a general muscular disability.
“The only way to communicate again with the people they love most, with their family, with their caregivers, is through alternative and augmentative communication technology. We, with artificial intelligence, are able to clone patients’ voices” and “we are already working with ALS patients who have started to speak again”, he reports.
However, “this speech results from text that is often produced in the spoken variant in Brazil”, what “It’s not natural at all“.
But, from the moment that “we have Amália, which will be the name that will be given to the LLM, a name inspired by a very important figure in our history, we will be able to control what is said in these conversations”.
In this way, the patients will be able to speak in Portuguese spoken in Portugal and this “is a fundamental piece”, but more than that, “it is a transversal piece to Public Administration“, it says.
Because “we can, for example, work on this model in the area of education and have our children learn in schools with a personalized tutor who knows the national educational curriculum”. In short, the use of LLM Amália “is completely transversal”.
On the other hand, “it gives technological autonomy, it allows us to improve the model over time, particularly in terms of introducing the multimodality system, which means also adding images, adding them later in the future as well, eventually, he says”, he adds.
It is “a national technological resource that is transversal to all areas of our societyresearch and startups”, he emphasizes.
And also “it will be an important piece for ‘startups’. She, at first, will not speak”, but “we have an Amália writing correct Portuguese, Portuguese spoken in Portugal and a basis for such cultural representation” and, “definitely, knowing more about Portuguese culture”.
Also in Public Administration the LLM Amália will have a “very important piece”, from education to innovation and for the “development of artificial intelligence in Portugal”.
A “very important” partner in this initiative “will be the Agency for Administrative Modernization, AMA”, because it will be the way to “transpose this LLM, this technology, to Public Administration”.
Basically, “it is an example of a partnership that brings together research centers and brings together Public Administration” and that “is also part of the ‘know-how’ developed in national ‘startups’ such as Unbable”, with the Center for AI responsible as a driving force of these collaborations, he concludes.
Cost
No value related to the investment for this innovation was announced, but Arlindo Oliveira has an estimate.
The president of the Board of Directors and the Executive Committee of INESC TEC – Institute of Systems and Computer Engineering, Technology and Science, foresees an expense between 10 and 20 million euros. “It seems to me to be a not very unreasonable value”.
In a radio interview, the former president of Instituto Superior Técnico also believes that an LMM in Portuguese will be useful in various aspects: “Customer service, new company products, support for students, applications in the medical or legal fields, among others”.
But warn that the implementation of this new it won’t be easy – especially because there are few texts in Portuguese.