top of page

Five Tips For Localising Software Strings Into Japanese

A polished and professional website is good to go and ready to impress. The design dazzles and there are no bugs in the code. But does the site work in Japanese? There’s much to be done to ensure that it does. Software localisation can ensure that any website or application works correctly in Japanese. But software localisation is a challenging undertaking that requires specific skills. A unique aspect of localisation, software localisation is the only discipline in which translation memory matches are subordinate to ID-based matches. String-based segmentation is preferred over traditional sentence-based segmentation. Software localisation breaks with the principle of having a unique translation for each source string. Software strings may be part of a larger whole, necessitating a consistent workflow. Software localisation will usually precede the translation of any technical documentation that accompanies the actual software. The software strings should be reproduced as terminology for the documentation. Here are five top tips for localising software strings into Japanese.

Avoid copy-paste

Software strings are usually exported into structured data formats such as JSON and XML or plain text formats including Apple strings and Java Properties. Many software developers think they are helping their translation partner if they copy-paste strings into cumbersome Excel sheets. But they are not! The copy paste approach is error-prone and time-consuming. Worse still, it doesn’t allow for or eliminate issues such as corrupt characters. A complex Excel document might require more preparation time than a well-formed JSON or XML file. The native file format is the richest one to utilise for localisation, always and in every way.

Take full advantage of keys


localising software disable

Every translatable string boasts a corresponding key or ID that indicates where the string appears on the user interface of the software. In JSON, keys are the name parts in name/value pairs: Keys play a crucial role in every successful software localisation project and for two reasons. Firstly, a translation unit stored with a key will always overrule a traditional translation memory match. Secondly, keys enable the isolation and locking of approved strings together with their separation from new strings. This is incredibly useful when coping with frequent sprints.

Optimize pre-processing


localisation

A proper import of software files into a translation management system (TMS) is usually preceded by preparatory engineering. There are many variables to deal with including character limitations, embedded HTML tags and characters that should or shouldn’t be escaped. Preparatory engineering is crucial and could require the software localisation engineer to think out of the box, if they are to get the most out of the TMS. When dealing with Asian languages like Japanese, it is particularly important to parse and display every possible particularity. This will ensure accuracy while providing sufficient context and guidelines. Japanese presents many potential pitfalls including the fact that the language doesn’t feature spaces to separate words. As a consequence, the regular line breaking algorithms don’t always apply. However, in other ways, Japanese can be more straightforward to tackle than some languages. Pluralization is certainly less of an issue than it is with eastern European languages. Japanese has a single form for both singular and plural whereas the translation of “%n items” differs in some Slavic languages such as when “%n” equals 2 or “%n” equals 3.

Consider trading in pseudo translation for machine translation

Pseudo translation is the process of replacing source strings with random characters. This enables the software engineer to test whether international characters are displayed properly in the user interface. They can also check whether all source strings have been extracted from the software and/or imported into the TMS. There’s no doubt about the importance of pseudo translation. It is a feature of a complex discipline that is often undervalued by translation buyers and LSPs. When dealing with Japanese, the pseudo translation could be exchanged for a dummy machine translation. A Japanese MT sample will reveal potential problems including corrupt characters and formatting issues. These can then be tackled before commencing the actual localisation process. A good MT pre-translation could also be the start of a post-editing project if time is at a premium or the budget is restricted.

Deliver working software strings back


software localization process

There may be occasions when the machine translation generates corrupt Japanese characters. In addition, sometimes the developer may not manage to import the MT sample back into the user interface, even though the Japanese characters look good. In these cases, there’s probably a character set or character encoding issue that needs to be addressed. Characters that are needed for a specific purpose in a computer environment are grouped in character sets such as ASCII or Unicode. Each character in a character set is associated with a number and this is known as a “code point”. An encoding provides a key to unlock the code through a set of mappings between the bytes in the computer and the characters in the character set. Unicode UTF-8 is the most common and safe encoding for JSON, XML and plain text file. This is because it is a “super encoding” that is able to process every character in every living language. Nevertheless, developers often provide software strings in an encoding that works perfectly for English and certain target languages like French and Spanish, but not for Asian languages. This may cause Japanese characters to be exported corruptly from the TMS. If the developer isn’t able to import the localised software strings back into the UI, even when there are no corrupt characters, this indicates that the encoding requirements have not been met. In the case of Java Properties, the target encoding must be ASCII. This encoding doesn’t support Japanese characters and therefore requires them to be Unicode-escaped, which looks a bit odd, but is perfectly fine:

Conclusion

Software localisation involves very specific challenges when compared to other disciplines in the localisation process. These challenges apply to every possible target language but might reach a higher level of complexity with Japanese. From a technical point of view, there are many idiosyncrasies of Japanese that must be interpreted, prepared for and processed properly. In order to deliver working software strings back to the developer, it is crucial to get the best out of the available TMS, to provide spot-on guidelines and to meet all the requirements imposed by the native file format. That impressive website or app can dazzle in Japanese.

#DITA #XML