DITA Files
File format | DITA |
|---|---|
Supported file extensions | .dita, .ditamap |
DITA (Darwin Information Typing Architecture) is an XML-based standard for structuring and publishing technical content. When you process DITA files in Wordbee Translator, the platform extracts translatable text while preserving the document structure, so you can generate accurately reconstructed target-language files.
How DITA Files Are Processed
Wordbee Translator includes a dedicated DITA parser that understands the structure of DITA documents. When you mark a DITA file for online translation, the parser:
Identifies translatable elements based on the DITA standard.
Extracts text content into segments for translation in the Editor.
Preserves non-translatable structure so the target file can be reconstructed after translation.
You can find the DITA parser configuration under Settings > Customization > Translation Settings > Document Formats.
Excluding Content with translate="no"
The DITA standard defines a translate attribute that content authors use to mark whether an element should be translated. When an element carries translate="no", it signals that the content is not intended for translation: for example, code samples, product identifiers, or legal boilerplate that must remain in the source language.
Wordbee Translator respects this attribute automatically. Content marked with translate="no" is excluded from extraction, and no segments are created for it in the Editor.
Key behavior
Scenario | Result |
|---|---|
Element with | The element's text is not extracted for translation |
Nested content inside a | All child elements are also excluded, regardless of their own attributes |
Elements without a | Extracted normally, following standard DITA parsing rules |
Elements with | Extracted normally (this is the default behavior) |
This behavior is always active when using the DITA parser. No additional configuration is required.
Note
The translate="no" exclusion applies to the entire subtree of the marked element. If a parent element such as <section translate="no"> contains paragraphs, lists, or other nested elements, none of that content will be extracted for translation.
Example
In the following DITA source, only the first paragraph is extracted for translation. The section marked with translate="no" and all its contents are skipped:
<concept>
<title>Product Overview</title>
<conbody>
<p>This product helps you manage translations efficiently.</p>
<section translate="no">
<title>Internal Reference</title>
<p>SKU: WBT-2040-EN</p>
</section>
</conbody>
</concept>Whitespace Compression
DITA source files often contain extra whitespace: line breaks, indentation, and consecutive spaces used for readability in the XML source. By default, this whitespace is preserved as-is in extracted segments, which can lead to inconsistent segmentation or unnecessary spaces in the Editor.
You can enable whitespace compression to normalize consecutive whitespace characters into a single space during extraction.
Enabling whitespace compression
To turn on whitespace compression for DITA files:
Go to Settings > Customization > Translation Settings > Document Formats.
Open the DITA parser configuration.
In the Content section, check Compress sequences of whitespaces into a single whitespace (recommended).
Click Save.
Preserved whitespace in code elements
When whitespace compression is enabled, certain DITA elements where whitespace is semantically significant are automatically excluded from compression. The following elements always preserve their original whitespace:
Element | Purpose |
|---|---|
codeblock | Code listings |
pre | Preformatted text |
codeph | Inline code phrases |
screen | Screen output |
msgblock | Message blocks |
lines | Lines of text where line breaks are significant |
Content inside these elements retains its original spacing and line breaks, even when whitespace compression is active for the rest of the document.
Testing Your Configuration
To verify that the extraction works as expected:
Open the DITA parser configuration and click Test configuration.
Upload a sample DITA file.
Review the extracted segments to confirm that content marked with
translate="no"is not present.Mark a file for online translation in a project and verify the segments in the Editor.
Generate the target file and confirm that the document structure is preserved, including both translated and non-translatable content.