Stefan BlosApr 21, 2021

Language identification on iOS

On a previous post we took a look at the Natural Language framework in general. We discussed the different use-cases it has and examples for when and how to use it. If you need a refresher on that I also recommend watching the video I made on the topic.

In this post we want to take a closer look at how we can do language identification with the Natural Language framework. We'll go through all the steps necessary and evaluate the options we have, so let's get started.

Create a Language Recognizer

The first thing we need to identify (a) language(s) is an instance of the NLLanguageRecognizer class. It takes no arguments since all the configuration is done in later stages.

let recognizer = NLLanguageRecognizer()

In the rest of the examples we will always work with this recognizer instance.

Note: We also need to import the NaturalLanguage framework in order for this code to work inside our apps.

Next, we need a String that we can process. This will also be the same for all following steps so let's do that once.

First, we create a String where we like to determine the language (feel free to use any String you like). Then, we call the processString(_:String) method on our recognizer.

let toDetect = "The force is strong in this one"
recognizer.processString(toDetect)

Note: when we want to work with multiple Strings we have to call .reset() on the recognizer before using processString again.

Predicting the Language

After having processed a String we can use the dominantLanguage property of the recognizer instance to receive an optional NLLanguage? entity. As can be seen in the documentation it has a rawValue (of type String) that gives us the desired prediction of the language in short form (e.g. en for English).

It is an optional value so if our recognizer fails to predict a language it will be nil. To access the value we can do this:

if let predictedLanguage = recognizer.dominantLanguage {
	print("Predicted language is: \(predictedLanguage.rawValue)")
else {
	print("Language could not be detected")
}

As this is probably the most common use case Apple has also built a quicker way to do this:

let toDetect = "I am your father"
// We will again receive an optional NLLanguage?
let predictedLanguage = NLLanguageRecognizer.dominantLanguage(for: toDetect)

While this requires us to unwrap the optional it is way more direct and short than creating a NLLanguageRecognizer object first.

Getting multiple languages as hypotheses

We also have the option to get hypotheses on the most likely languages that the framework thinks it detects in the given String.

let hypotheses = recognizer.languageHypotheses(withMaximum: 3)

The result will be a dictionary of NLLanguage objects as keys and probabilities (of type Double) as values. This means that no order is given but we have to manually find the one entry with the highest probability.

We can do this using the max(by:) method of Dictionary:

let mostLikelyLang = hypothesis.max { a, b in a.value < b.value }

Note: It is not guaranteed that this array will be exactly the amount of entries we specified as a maximum.

The advantage of using the hypothesis is that we also get a probability. If that is low we can act accordingly and not just blindly trust the prediction of the framework for the most likely language.

Limiting/assisting the predictions

While the aforementioned examples should cover most use-cases there is two more things we can do.

First, we are able to configure our recognizer to only look for a certain set of languages by using the languageConstraints property. It can be set to an array of NLLanguage objects:

recognizer.languageConstraints = [.english, .german, .french, .spanish]

This limits the amount of languages that can be detected.

If we want to go even further we can use the languageHints configuration which takes a Dictionary of [NLLanguage: Double] pairs and allows us to pre-configure our recognizer with some probabilities for languages we know:

recognizer.languageHints = [
    .english: 0.6,
    .german: 0.4,
    .french: 0.7,
    .spanish: 0.2
]

However, this is only useful in very, very specific circumstances and if we have a really feasible reason to do it.

Summary

We can use the NLLanguageRecognizer for two purposes:

detecting the most likely language of a String
getting the probabilities of the most likely languages of a String

This can be done in a rather straightforward fashion. While it also offers some more customization and limitations, those should only very rarely be necessary.

With that, Apple gives us an easy way to use smart functionalities in our apps.

Thanks you for following this article, I hope you enjoyed it. If you have any feedback or things to add I'm happy to hear from you on Twitter or LinkedIn. If you're interested in more content, check out my other posts or my Youtube videos.