Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 31 Next »

When setting up a file format configuration for Microsoft Word, there are many options to choose from to ensure extraction is successful. This page will explain the most common options for Word files.

The following file extensions are supported when setting up file format configurations for Microsoft Word: .doc, .docx, .dot, .dotx, .docm, .dotm.

Please click on a section to see specific information regarding a configuration option. 

To learn more about working with file format configurations, please see the following pages: 

 

General Tab

The General Tab provides options for configuring what content will be extracted as well as options for handling whitespaces, symbols, and text segmentation.

  • Content SectionContains extraction rules for document properties, headers, footers, calculated fields text, table of contents, and user comments.

  • Whitespaces and SymbolsContains options for showing/hiding whitespaces or symbols and for converting them into markup.

  • Text SegmentationEnable SRX rules for text segmentation and elect to always split text at line breaks.

 

Do Not Translate Tab

The Do Not Translate Tab may be used to specify what text styles, colors, segments, and words within the Word file should not be translated.

Configuration OptionDescription
StylesConfigure specific paragraph styles to not be extracted during the translation.
ColorsSelect one or more text colors to be excluded when text is extracted for translation.
Segments

Mark specific text segments as translatable or not translatable when found by the system. Depending on the segment and chosen translation option, the segment will either be extracted for translation or ignored.

Regular expressions may be entered in the system to protect entire segments or just terms within the file. These segments or terms will not be extracted for translation and be taken into account during the wordcount step. In the translation editor, they will appear as tags and can be used to protect parts of texts that should not be translated, but should still appear in the translated document.

A good example, is entering terms or regular expressions to protect brand names or confidential content like software codes.

Words or Terms

Configure single words, terms, or segments to be excluded from the translation. Any text captured by regular expressions is converted into markup and not modified. A description may be added to avoid confusion. When no description is added, the original text will appear upon hovering over the markup.

This feature is useful to protect certain terms that must not be modified by the translator. For instance the company name, or technical references, etc...

 

Fonts Tab

The Fonts Tab may be used to replace fonts for certain lanaguage specific translation scenarios.

Configuration OptionDescription
Replace font when translating into Japenese, Chinese, Korean and similar.

Some fonts cannot correctly display Asian characters. If they are used in a document translated into Japanese the final document will be unreadable. This option forces Word to use a user-defined font for the Asian texts if there is not any compatible font defined.

Replace font when translating into Arabic, Hebrew, Farsi and other "complex script" languages.

Some fonts cannot correctly display complex script language characters. If they are used in a document translated into Arabic (for example) the final document will be unreadable. This option forces Word to use a user-defined font for the complex script texts if there is not any compatible font defined.

Replace all fonts in the translated document.

An option for translating between languages where different scripts and fonts exist. In some cases the used fonts are not compatible with the target language thus causing the text to be unreadable. This option informs the system to replace all fonts so that the translated document is easy to read.

For example, an OCR conversion can create a Word file consisting of extensive font changes. As a result, the document in Wordbee will contain a vast amount of markups. This option may be used to override the fonts of the document by a single user-defined font to drastically reduce the markups.

 

Reduce Markup Tab

The Reduce Markup Tab may be used to substantially reduce markup for improved tranlsation memory hits and less work for your translators. Please note that removing markup can result in minor differences between fonts and text styles.

Configuration OptionDescription
Remove irrelevant font or style changes

These options are designed to eliminate any font or style changes, which are irrelevant and cannot be distinguished visually.

Formatting changes that have been applied to whitespaces.

A formatting change in the Word document is represented in Wordbee by markups (aka tags). An extensive amount of markups can complicate the translation work. This option permits to ignore formatting changes applied on spaces. 

For instance, if Italic formatting is only applied on spaces, no difference will be seen in Word; however, in Wordbee this difference will be represented by tags. This option may be used to avoid excessive markups caused by the formatting of whitespaces.

Formatting changes to text that does not contain letters or digits.Just as above, text not containing letters or digits that has experienced a formatting change will be represented by markups (tags). This option prevents excessive markup for these types of formatting changes.

Ignore Asian font changes if the source language is not Asian. Do not tick if the original text contains Asian characters.

<Need Description>
Asian or Arabic/Hebrew/Farsi font changes when the source language is not the same.<Need Description>
OCR (Optical Character Recognition) noise reduction.A document created with an OCR tool may contain a vast amount of formatting changes. One example of this is when the character spacing is adjusted between EACH character by the OCR tool. This option permits to ignore the formatting changes generally applied by OCR, which can be ingored without losing the general appearance of the document.
Visually reduce markup in the translation editorPreserves the original styles and fonts to reduce markup in the translation editor.

 

Embedded Files Tab

The Embedded Files Tab may be used to extract an embedded Excel, PowerPoint, or Word File for translation.

Configuration OptionDescription
Extract embedded Excel FilesInform Wordbee to extract any embedded Excel files for translation.
Extract embedded PowerPoint FilesInform Wordbee to extract any embedded PowerPoint files for tranlsation.
Extract embedded Word FilesInform Wordbee to extract any embedded Word files for translation.

When using the options on the Embedded Files Tab, Wordbee will extract all content from the file, not just the visible content.

View our Microsoft Word file format Questions and Answers section to learn how to perform common file format customisations. These examples are the most frequently answered by our support team.

  • No labels