Artificial intelligence revitalising Sámi
Researchers at Aalto University are developing speech recognition and transcription tools to facilitate the preservation and use of Sámi languages in everyday life.

Sámi languages are endangered, but AI can help revive them. Professor Mikko Kurimo from Aalto University, who specialises in automatic speech recognition and spoken language modelling, is working with his team to develop speech recognition and transcription tools for sound archives in Northern Sámi.
“Over the last few years, we have often been asked whether Sámi interview recordings can be converted into text format using artificial intelligence. There is a lot of material that has not been transcribed,” Kurimo explains.
“At the same time, if transcription and other AI tools cannot be made to work in Sámi, the language is in danger of becoming more and more obsolete, as Sámi speakers manage their daily affairs more effectively in English, Finnish, Norwegian and Swedish.”
The first challenge the team is tackling is Northern Sámi, which is by far the most widely spoken Sámi language in Finland. Even so, it is only spoken by around 20,000 people.
Mikko Kurimo“At the same time, if transcription and other AI tools cannot be made to work in Sámi, the language is in danger of becoming more and more obsolete, as Sámi speakers manage their daily affairs more effectively in English, Finnish, Norwegian and Swedish.”
In cooperation with the Finnish National Audiovisual Institute (Kavi), Kurimo’s team has used radio and television programmes to train major speech models to deliver accuracy in the speech recognition of Finnish, the Swedish spoken in Finland and Sámi.
The researchers selected 30,000 hours of programmes from the last 15 years for training the Sámi model. For training the corresponding Finnish and Finland-Swedish speech models, 200,000 hours of programmes were selected for each. According to Kurimo, speech models have previously been created only in English using monolingual data sets as large as these.
“This model is able to learn structures and recurring patterns in speech all on its own. Part of the speech is hidden, and the model predicts the missing part. This way, through trial and error, it absorbs the words and structure of the language,” Kurimo explains.
However, a large speech model alone is not enough for accurate speech recognition. It also needs to be taught how to turn the speech into text, and this is done using data sets of transcribed speech. This results in a tool that can convert speech into text files.
Kurimo’s team have used transcripts of the Sámi Parliament’s meetings as their data set.
“This material is unlikely to represent everyday speech and the dialects of Northern Sámi very well. Much more transcribed speech is needed to train and test an accurate transcription tool.”
At first, every sentence began with ‘naa’
As anyone who has tried ChatGPT or other chatbots knows, machine intelligence can produce ingenious but also absurd outputs. This is also the case when developing transcription tools.
“At first, the speech recognition sensor started every sentence with ‘naa’. When we asked speakers of Sámi about this, they told us that it is indeed common to say ‘naa’, ‘nii’ or ‘noo’ when starting to speak. Since the first word is usually hard to guess when someone starts to speak and since ‘naa’, ‘nii’ and ‘noo’ are phonetically similar, the AI had interpreted them as the same word,” Kurimo says.
The Aalto University researchers involved do not speak Sámi themselves but have linguists from the University of Lapland to help them with any language problems they encounter.

In fact, the project started out when linguists from the University of Lapland contacted the Aalto University to ask for help with transcribing recordings of speech using speech recognition. Aalto University researchers set out to create a model, and once it proved good enough for further development, researchers from the Aalto University and the University of Lapland came together and jointly submitted a grant application to the Finnish Cultural Foundation.
In February, they were awarded €200,000 for research that involves recording spoken Sámi and developing AI speech recognition with the aim of revitalising the Sámi language.
During the project, the team will train the speech model to more accurately recognise Northern Sámi, including nuances, by feeding it more transcribed material. Such material is produced by the researchers at the University of Lapland, led by Professor Pigga Keskitalo. They also monitor whether the speech model’s Northern Sámi is actually improving.
Putting Sámi on an equal footing with major languages
The researchers are exploring the possibility of expanding speech recognition to cover the less common Sámi languages spoken in Finland: Inari Sámi and Skolt Sámi. In total, less than a thousand people speak these languages as their mother tongue.
The data sets for these are very small, of course, but the model that has been taught to recognise Northern Sámi is going to be a big help.
“Because there are structural and lexical similarities between the Sámi languages, the model created for Northern Sámi will get us started. In addition, the current data sets may already include some material in the rarer Sámi languages, just as English and Swedish are sometimes spoken in Finnish television programmes,” Kurimo explains.
The project aims to develop learning tools that will make it easy for people who have forgotten Sámi, or for Sámi people who never learned it as children to practice it on the computer.
The Sámi transcription tool may also be further developed, for example, to help transcribers of meeting minutes or subtitlers of television programmes.
“We want Sámi to be on an equal footing with the major languages when it comes to information technology.”
As computer scientists, the members of Kurimo’s team are passionate about many areas of information technology and related research.
“There are many things we want to learn about speech recognition technology that are not tied to a particular language. This means that our research will also benefit speakers of other rare languages to whom commercially funded AI tools are not available.”