When setting up a file format configuration for Microsoft Word, there are many options to choose from to ensure extraction is successful. This page will explain the most common options for Word files.
The following file extensions are supported when setting up file format configurations for Microsoft Word: .doc, .docx, .dot, .dotx, .docm, .dotm.
Please click on a section to see specific information regarding a configuration option.
General Tab
Configuration Option | Description |
---|---|
Content Section | Extraction rules for document properties, headers, footers, calculated fields text, table of contents, and user comments. |
Whitespaces and Symbols | Elect to not show leading and trailing whitespaces, convert sequences of multiple whitespaces into markup, do not show leading or trailing characters that are not letters or digits, convert words containing no letters or digits into markup. |
Text Segmentation | Enable SRX rules for text segmentation and elect to always split text at line breaks. |
Do Not Translate Tab
Configuration Option | Description |
---|---|
Styles | Configure specific paragraph styles to not be extracted during the translation. |
Colors | Select one or more text colors to be excluded during as text is extracted for translation. |
Segments | Mark specific text segments as translatable or not translatable when found by the system. Depending on the segment and chosen translation option, the segment will either be extracted for translation or ignored. |
Words or Terms | Configure single words, terms, or segments to be excluded from the translation. Any text captured by regular expressions is converted into markup and not modified. A description may be added to avoide confusion. When no description is added, the original text will appear upon hovering over the markup. |
Fonts Tab
Configuration Option | Description |
---|---|
Replace font when translating into Japenese, Chinese, Korean and similar. | Applies when the target language is Asian language only, as the original document often contain non-compatible fonts that translate into unreadable text. |
Replace font when translating into Arabic, Hebrew, Farsi and other "complex script" languages. | Applies when the target language is a "complex script" language. This prevents uncompatible fonts in the original document from becoming unreadable text. |
Replace all fonts in the translated document. | An option for translating between languages where different sripts and fonts exist. In some cases the used fonts are not compatible with the target language thus causing the text to be unreadable. This option informs the system to replace all fonts so that the translated document is easy to read. |
Reduce Markup Tab
Configuration Option | Description |
---|---|
Remove Irrelevant font or style changes | These options are designed to eliminat any font or style changes, which are irrelevant and cannot be distinguished visually. This includes ignoring:
|
Visually reduce markup in the translation editor | Preserves the original styles and fonts to reduce markup in the translation editor. |
Embedded Files Tab
Configuration Option | Description |
---|---|
Extract embedded Excel Files | Inform Wordbee to extract any embedded Excel files for translation. |
Extract embedded PowerPoint Files | Inform Wordbee to extract any embedded PowerPoint files for tranlsation. |
Extract embedded Word Files | Inform Wordbee to extract any embedded Word files for translation. |
When using the options on the Embedded Files Tab, Wordbee will extract all content from the file, not just the visible content.