Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

When setting up a file format configuration for Web Pages, there are many options to choose from to ensure extraction is successful. This page will explain each section of the configuration options for Web Pages.

The following file extensions are supported when setting up file format configurations for Web Pages: htm, .html, .xhtml, . htmls, .php, .php2, .php3, .php4, .php5, .php6, .phtml, .csm, .jsp, .ahtm, .ahtml. 

These sections have been provided to help you become familiar with available Web Page configuration options in Wordbee Translator: 

To learn more about working with file format configurations, please see the following pages: 

 

General Tab

The General Tab contains options for configuring the type of encoding, HTML code, HTML attributes, content exclusion, and text segmentation. The options are described in general below based on individual sections. 

Configuration OptionsDescription
EncodingThe default encoding selection for web pages is UTF-8; however, the encoding option may be used to select a different type of encoding such as Windows, Macintosh, ASCII, etc. An additional option is provided for converting characters that are not compatible with the target encoding into entity references.
HTML Code

These options inform Wordbee Translator how the HTML code itself will be handled during translation. Within this configuraiton section, you will be able to configure:

  • To show or not show whitespaces at beginning and end to the translator.

  • Compress sequences of whitespaces into a single whitespace.

  • Replace '& nbsp;' by blanks.

  • Always show HTML tags preceeding and trailing text.

  • Change how entity references are displayed to the translator: keep original HTML code, change all to characters (default), or convert numeric entities only.

 

HTML Attributes

The options inform Wordbee Translator how to handle specific HTML attributes during translation and include:

  • Choosing to show whitespaces at beginning and end to the translator.

  • Choosing to compress sequences of whitespaces into a single whitespace.

  • Changing how entity references are displayed to the translator: keep original HTML code, convert all to characters (default), or convert numeric entities only.


Exclude Content

This option may be used to configure specific content to be excluded from the extracted text for translation. Within this configuration section you can enter text segments or regular expressions.

If a text/pattern matches, then it is possible to mark the segment as not translatable, as translatable or as potentially not translatable. The latter two will be shown to the translator. The system checks one pattern after the other until one matches. Text that matches none of the patterns is considered translatable.


Text Segmentation

This configuration section may be used to enable options for text segmentation during tranlsation. Here you can:

  • Enable/Disable SRX Rules for Text Segmentation
  • Elect to Always Split Text at Line Breaks

 

 

Server and Client Side Code Tab

The Server and Client Side Code Tab contains options for configuring the extraction and exclusion of quoted strings and additional content within the web page to be translated. 

Configuration OptionsDescription
Extract Quoted Strings

Web pages may contain Javascript or server side code such as PHP. You can decide whether the system will automatically extract quoted strings in code sections during translation. These configuration options are provided:

  • Show whitespaces at beginning and end to the etranslator.
  • Compress sequences of whitespaces into a single whitespace.
  • Enable the use of //notrans, //beginnotrans ... //endnotrans to delimit not translatable code.
  • Extract quoted strings even when no pattern entered in the "Exclude Quoted Strings" configuration section matches.

 

Exclude Quoted Strings

This configuration section may be used to enter specific text segments or regular expressions to exclude from the translation. For each piece of text (segments), the system looks for the texts or regular expression patterns entered in this configuration section.

If a text/pattern matches, then it is possible to mark the segment as either translatable or not translatable. The system checks one pattern after the other until one matches. Text that match none of the entered patterns are considered translatable.


Include or Exclude Additional Content

This configuration section may be used to specify regular expressions to extract text inside code (Javascript, etc.).

The regular expressions are not limited to quoted strings but can capture anything. The regex MUST contain capture groups named "pattern1", "pattern2", etc. Example: @(?<pattern1>.*?)@ will extract any text delimited by "@".


 

HTML Tags and Attributes Tab

The HTML Tags and Attributes Tab contains options for managing translatable attributes within the web page, non-breaking tags, and whitespace preserving tags for the translation.

Configuration OptionsDesriptions
Translatable Attributes

By default, the several attributes are configured to be extracted for translation (alt, title, placeholder, content, values, etc.). Conditions can be defined on the containing parent tag and other attributes that must have specific values.

Within this section, you can change the pre-defined attributes name, parent tag, or advanced condition. Additonally, this seciton may be used to:

  • Mark an attribute as translatable.
  • Mark an attribute as non-translatable.
  • Elect to use Regex for a chosen attribute.
  • Remove an attribute from the configuration.

 

Non-Breaking Tags

By default, several tags are pre-defined in the Web Page configuration to be non-breaking or inline tags. These are typically links, images or text formatting elements. Tags are case insensitive and include the following pre-defined items:

a  acronym  b  big  blink  br  cite  code  dfn  em  font  i  iframe  img  kbd  s  small  span  strike  strong  small  sub  sup  tt  u  var  ruby  rt  rc  rp  rbc  rtc  asp:label

Additional non-breaking or inline tags may be entered in this section if needed for the Web Page translation.


Whitespace Preserving Tags

Whitespaces are generally disregarded within HTML code. However, tags entered in this section are considered an exception and preserve any whitespace within the file during translation. Tags are case insensitive and include the following.

pre     script     style

This configuration section is for information purposes only and no additional tags may be added for whitespace preservation.


 

CMS Specific Settings Tab

The CMS Specific Settings Tab contains options for handling custom markup, as Wordpress, Drupal and other CMS include so called "shortcodes" in the HTML. Shortcodes are markup and do not need to be translated. Shortcodes use square brackets such as in: [image title="This is a text"].

Configuration OptionsDescription

Content between Double Brackets is Considered Markup

Enable this option to ensure "shortcodes" including by the CMS as part of the HTML file are not extracted for the translation.

 

Note that attributes in the codes may need translation. If certain shortcode attributes need translation, then you need to add the attributes in the HTML Tags and Attributes Tab.

  • No labels