/
Customize Rules with XML
Customize Rules with XML
Text extraction rules are stored in XML format.
Example of a Microsoft Word rule:
<?xml version="1.0" encoding="utf-8"?> <!-- Exchange format for text extraction rules --> <ParserConfigurations xmlns="http://www.wordbee.com/config"> <!-- Rule --> <ParserConfiguration xmlns="http://www.wordbee.com/config"> <Name>Microsoft Word</Name> <Description>Extracts all contents including header, footer, document properties and user comments.</Description> <ParserDomain>MSWORD</ParserDomain> <EParser>4</EParser> <SegmentationRulesEnabled>true</SegmentationRulesEnabled> <SegmentationSplitAtNewlines>true</SegmentationSplitAtNewlines> <SegmentationSplitAtInlineTags>true</SegmentationSplitAtInlineTags> <VersionPretranslation>CompareTexts</VersionPretranslation> <UserTextPatterns xmlns="" /> <CompactingOption xmlns="">0</CompactingOption> <ModulesVersion /> <MSOfficeConfiguration xmlns="http://www.wordbee.com/config/msoffice"> <TrimWhitespaces>true</TrimWhitespaces> <TrimNoLetterDigit>false</TrimNoLetterDigit> <RemoveFormatWhitespaces>true</RemoveFormatWhitespaces> <RemoveFormatNoLetterDigit>false</RemoveFormatNoLetterDigit> ...
You can tweak the XML directly as long as you find out what the different options mean. For example:
- Name: The print name of the configuration.
- Description: Optional description of the configuration.
- SegmentationRulesEnabled: Switches segmentation of text on or off.
Rules may themselves embed further rule definitions. For example, an XML rule may include an HTML rule for processing nodes that contain HTML content. Or, a Word rule may contain an Excel or Powerpoint rule to handle such formats if embedded in a Word document.
In general we recommend using Wordbee Translator for customizing rules interactively:
, multiple selections available,
Related content
Add New Rules
Add New Rules
More like this
Rule selectors
Rule selectors
More like this
Customize Rules
Customize Rules
More like this
Creating Custom SRX Rule Configurations
Creating Custom SRX Rule Configurations
More like this
Working with file formats
Working with file formats
More like this
Viewing SRX Rule Configurations
Viewing SRX Rule Configurations
More like this
Copyright Wordbee - Buzzin' Outside the Box since 2008