When and how does the Beebox align?
When the Beebox receives a source file it proceeds as follows:
- The source file is segmented and pre-translated from the Beebox memories.
When you further include translated files and the instructions file, it adds these steps:
- The translated files are aligned. For the alignment algorithm to work more precisely, a “training” file is created on the fly. The file includes all the pre-translations found in step 1 plus any alignment dictionaries that were optionally configured in the Beebox project.
- The pre-translations of step 2 are now replaced by the aligned translations. If alignment is not possible for some pieces of the content, the pre-translations in step 1 are kept.
The next steps are the same whether you align or not:
- The Beebox selects all unapproved translations (segments). These are sent to MT and/or a TMS. Approved translations are considered “final” and are not sent to MT or a TMS.
How does the Beebox decide if translations are approved or not? There are two rules:
- By default, aligned translations are flagged as “unapproved”. This is a design decision since alignment may not be perfect and require human approval in the TMS.
- If an aligned translation (step 4) is identical to the pre-translation (step 1) and the pre-translation itself is approved, then the aligned translation is considered “approved” too.
If these rules sound too technical, the following two examples may add some clarity:
- If you send the file and there is no memory yet in the Beebox, the system will yield translated segments coming exclusively from the translated file (no pre-translation in step 1).
All segments are “unapproved” and a human will have to approve/fix the results.
- You already had translated source file in the past. You now receive a new version of the source file + translations. At this point, both the source file and the translated file may have been edited, maybe or maybe not.
The Beebox will align the files and identify all changes, both in the source and the translations (with respect to the previously done translations). All the changes are flagged as “unapproved”. All the content that did not change are flagged as “approved”.
The final result: Only the changed segments will be sent to the TMS.
Legacy translation memories
Use of legacy translation memories can improve alignment quality.
The classic approach is using legacy TMs to recover translations: Submit source content and recover translated content by leveraging the TMs. This rarely works well even when the TMs were used to translate all your content. The reason is potential segmentation and markup differences between legacy translation tools and the Beebox.
The Beebox approach is different: Extract translations directly from the translated files and align these with the source content. The TMs are solely used to more reliably extract the translations directly from the files.
To upload your TMs, select your Beebox project and click “Resources” in the left navigation menu:
Upload a TM:
- Click the Add a resource link at the bottom and type a name for your TM
- Click the Upload link and follow instructions.
- The TM is shown in this screen. Note that the Pretranslation use option must be ticked.
Dictionaries
If you do not have legacy memories you should consider uploading terminology databases instead. This can substantially enhance the alignment algorithm itself. It makes sense even when you do have legacy memories. There are 3rd party tools to create terminology databases: Google for “Bilingual term extraction”.
To upload a terminology database, follow the same instructions as in the previous chapter. Then tick the Alignment dictionary option:
To view and manage your alignment dictionaries, click the Alignment dictionaries tab on the page:
You can substantially improve alignment results by uploading relevant words and terms together with their translations.
Navigate to the Resources page of your project and upload a memory or dictionary. Then tick the Alignment dictionary option.
Read more: Alignment Dictionaries