Validate your iOS and Android translations with Locheck

2021년 9월 15일
facebookx-twitterlinkedin
Validate your iOS and Android translations with Locheck

When mobile apps need to ship in multiple languages, the developer often hires a contractor or external service to translate all the strings. The people doing these translations are usually unfamiliar with the technical details of localization, which makes it easy for them to introduce bugs when a string contains a variable. Even if the app only ships in one language, it’s still possible to write bugs by making subtle mistakes. In order to ship a bug-free app, the developer needs some way of ensuring the format strings in every translation are correct even if they don’t speak the language.

At Asana, where we ship in 13 languages, we developed Locheck to automatically verify that every string in our .strings, .stringsdict, and strings.xml files use consistent arguments and types, and report errors to our CI pipelines. In this post, I’ll cover some challenges with localization and show how Locheck makes sure we don’t ship with bugs.

Locheck-open-source 1

How Locheck catches bugs

Locheck compares the language you develop into the languages you translate to, and makes sure all their types match. It can catch things like when:

  • A string appears in one localization but not another

  • An argument is used in a localization but does not appear in the base localization

  • An argument has different types in different localizations or different plural variants

  • The translation has misspelled a named variable

For .strings and .strings.xml files, this is relatively simple given a fancy enough regular expression and knowledge of the syntax. Locheck parses a string like "added %d tasks to %3$s" into a list of Swift structs:

Locheck open source (Image 2)

(We are very fortunate that iOS and Android use a close enough format string syntax.)

Locheck then generates a list for each string, and then compares the same string keys across translations, logging a warning or error if they differ. Some issues might cause crashes, for example if your German translation uses %s instead of %d.

The challenges of .stringsdict

.stringsdict files are much more complicated. Here’s a shorthand version of the plural rule I showed earlier:

Locheck open source (Image 3)

The %#@tasks@ substring means “recurse into tasks." You can even nest these rules:

Locheck open source (Image 4)

(There is a simpler way to define this rule, but sometimes nesting is really necessary.)

These rules form a grammar, defining a set of possible strings. The rules are traversed before the format string is applied. That means in order to really be sure the arguments are correct, every permutation needs to be checked. Here are all the permutations of the .stringsdict entry example above:

Locheck open source (Image 5)

Given the permutations above, look at how the arguments differ in each permutation. Without explicit positions, the second permutation might mistakenly use the value for tasks in front of milestones, and try to use a number for the string argument at the end. If we add explicit positions, these problems disappear:

Locheck open source (Image 6)

Locheck knows how to expand these rules and can log intelligent errors to help you find problems.

Locheck open source (Image 7)

Deep-dive into a common problem

Imagine we’re making a task list app with an activity feed.

On Android, there is built-in support for a plurals element in strings.xml for this:

Locheck open source (Image 8)

On iOS, we would add an entry to our Localizable.stringsdict file:

Locheck open source (Image 9)

Then in our code, we’d access the string:

Locheck open source (Image 10)

And we’d get back whichever variant matched the value of numTasks we put in.

Or would we? No, we would not!

If we pass a value of 1 for numTasks, the app will actually crash, because after the system substitutes our string value, we’re really doing this:

Locheck open source (Image 11)

This kind of mistake is extremely easy to make if you’re not used to thinking about these details, for example if your job is to translate text between different languages rather than write code all day, or if you’re translating to a language like Japanese where the order is often different. Even if you have a developer review every string, it can be very tricky to spot these issues. And as code and teams scale together, tricky-to-spot bugs become guaranteed-to-ship-to-production bugs.

How to fix it

The right thing to do is to add explicit positions to non-consecutive arguments. Instead of writing %s for our third argument, we should write %3$s, which makes it always use the third argument.

Locheck open source (Image 12)

Best practice would be to use explicit positions 100% of the time, but it can be prohibitively time-consuming to retroactively add explicit positions if your source of truth is an online service like Transifex, which is true for us at Asana. And you might still get errors if the people doing the translations aren’t perfect at understanding format strings.

Locheck will catch this type of problem automatically, so it’s safe to use implicit positions. There might still be translation errors where two strings are incorrectly swapped and their format specifiers still match, but at least the app won’t crash.

How to use Locheck

You can install Locheck using Mint or Make:

Locheck open source (Image 13)

Locheck emits Xcode-style errors to stderr, as well as a human-readable summary to stdout after all files are examined. It works well as an Xcode Run Script build phase, continuous integration step, or precommit script. Here’s some example output from our demo files:

Help us out

While we’ve run Locheck on our own code and a few open source apps, it’s still early. If you do decide to try it out, please leave feedback as a GitHub issue. Enjoy your new localization-bug-free life!