Latvia’s Tilde releases open source LLM for European languages – TildeOpen LLM  

Viesturs Abelis

Tilde has released an open-source large language model (LLM) TildeOpen LLM – an artificial intelligence (AI) solution that specialises in generation of text in the European languages. The unique LLM, developed by Tilde on behalf of the European Commission, is freely accessible to anyone interested. It enables building, based on TildeOpen, specialised models customised for specific assignments that will work excellently in the languages of Europe’s small countries. 

New large language model – more accurate and more secure  

The LLM, developed by the leading Latvian artificial intelligence specialists, is both more accurate with grammar rules of the small languages and more secure. Developers can host the model on a local server, thus being assured that all the information submitted to LLM will stay on the premises or in a safe cloud storage. Popular commercial language models are usually hosted in data centres situated in the USA or Asia and do not always comply with the EU data protection and data privacy policy standards. 

“Popular commercial language models, such as ChatGPT, are mostly trained using English language data, implying that the results generated in English will be of better quality than those generated in other, less common languages. This happens to lead to awkward sentence structures and word order, grammatical errors or even inaccurately used and translated terms. These mistakes become very obvious when LLM is used to perform more complex and specific assignments. Exactly for this reason TildeOpen was tailored to European languages, especially the languages of the Baltic countries, as well as Ukrainian and Turkish languages, that frequently are underrepresented in current LLM solutions. Tilde is one of the few companies in Europe that, using supercomputer resources and unique expertise in the field of AI, has been able to fully develop such a foundation LLM completely on its own,” explains Tilde CEO Artūrs Vasiļevskis.

He emphasises that TildeOpen can skilfully adapt to the LLM’s Latvian language, therefore it should be considered by both state administration and local governments, as well as local companies and education establishments. It is also one of the requirements of the European Commission – EU developers must develop AI products for usage in the internal market and store them in secure resources in Europe that comply with the EU data protection directives and standards. 

Supercomputers of Europe – for training AI in small languages  

There are 24 official languages and more than 60 regional languages in the EU, however, the developers of popular LLMs focus on the largest languages, leaving the smaller ones behind. This approach doesn’t suit Europe, as more than 200 million Europeans or nearly half of Europe’s population speak the so-called small languages. In order to promote the global competitiveness of Europe in the AI field, the EU announced the Large AI Grand Challenge, where Tilde – one of Europe’s leaders in AI-driven language technologies – was declared the winner in June 2024.  

Winning the challenge gave access to two million graphics processing hours (GPU) on the fastest supercomputer in Europe – LUMI. These processing hours were granted specifically for development of TildeOpen. This year Tilde was among the first companies to be allowed to work with recently launched JUPITER – the latest and currently the fastest supercomputer in Europe. Giving credit to the capacity of these computers, the first version of TildeOpen was developed within about a year. 

The LLM, containing more than 30 billion parameters, was trained with an enormous amount of general information from various sources, thus creating the base model. Users can customise that base to perform specific assignments, for instance, they can develop an AI assistant fluent in a European language.  

TildeOpen is an open-source solution. It is freely available to national authorities, companies, scientists, students, medical institutions, and financial and insurance sectors to use the model in line with the needs of their respective sectors.  

TildeOpen can be securely hosted both on a local server and cloud storage, and it is tailored to those European languages that are often underrepresented in the most popular solutions. TildeOpen Version 1 has been released on the platform Hugging Face.  

Share this article