How does the system preapare the text for Machine Translation processes?

Question

Sometimes my files contain many tags and the result of the translation generated by my machine translation provider is not great. What can I do to improve the results?

Answer

Markup handling and placement is a very complex topic for which results are always best when dealing with small volumes and generic html based markup.

In Wordbee Translator, once a document is added to a project and marked for translation, the text gets extracted from the file using the rules defined in the file format configuration. This set of rules rely on RegEx and other segmentation mechanisms. This process parses the file to get all text that requires translation and creates a structure of translation units called segments.

As a result:

Each segment is a machine-readable string that can be processed by any engine. These strings contain the text in the language processed and markup. The markup defined by extraction rules can be of different types (custom, html-compatible), which makes the string unique.
Because of the nature in which markup can be defined in the text extraction rules, the system needs to prepare the string in (1) further to make it compatible with machine-related processes that can happen outside Wordbee Translator. The way you extract the text and generate the markup will have an impact on machine-related processes.

When segments are prepared for machine translation, the following processing is applied to the segments:

The text in the source language of the segment and its markup are further prepared to maximize the chances of getting the integrity of the content translated by the MT provider.
The markup in the string is further converted into generic html markup, to make it machine compatible.
The new converted string is sent to the MT provider chosen, as per MT profile configuration.
Once MT output is generated by the MT provider, the Wordbee Translator verifies if the markup obtained in the output is valid as per the initial MT request. It checks if the translation generated by the MT provider has:
1. returned all markup
2. the markup was correctly placed
  Wordbee Translator has several mechanisms in place that allow to "roughly" fix any major markup issues. These aim at making accurate translations and preventing issues when reconstructing the file with all translations, such as making the file readable in the first place.
Finally, once the machine translation output is available and validated, the system needs to convert the html-based markup back to the style initially parsed in Wordbee Translator.

Example

If the MT profile selected sends the text to 'Microsoft MT’, once translations are provided by the end MT engine, the system needs to convert back the markup handed over by Microsoft to what it originally was when the file was marked for online translation. The translation provided by Microsoft needs to be further processed to convert and place the markup accordingly. If things went well, there is none to little difference between these markups.

In a nutshell: Your regex markup goes to Microsoft and back to us. We sometimes have to "fix" markup. Fixing is never perfect and so you might sometimes see incorrectly placed markup.