Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

locale

The language of the text, such as “en”, “de” etc.

string, Mandatory

text

The text to be tested. Up to 1000 characters.

string, Mandatory

independentRulesId

The language independent SRX rulerules.

If you set to null, then no language independent rules will be loaded (not recommended). Use settings/srx/find to find configurations.

int, Optional

languageRulesId

The language specific SRX rule. It must match the locale of the text. Use settings/srx/find to find configurations.

If you set to null, then no language specific rules will be loaded (not recommended). Use settings/srx/find to find configurations.

int, Optional

Example payload:

Code Block
{  
    "locale": "de",
    "independentRulesId": 234,
    "languageRulesId" : 213,
    "text": "Hallo geht es am 20.4. um 3 Uhr? Geht es spaeter?"
}

RESULTS

The JSON result shows the SRX configuration propertiessegmentation results. A result for the sample above might be:

Code Block
{
    "id": 6501count": 2,
    "original": "Hallo wie geht es am 20.4. um 3 Uhr. Geht es spaeter?",
    "segments": [
        {
            "position": 0,
            "text": "Hallo wie geht es am 20.4. um 3 Uhr."
        },
        {
            "position": 36,
            "text": " Geht es spaeter?"
        }
    ],
    "rules": [
        {
            "position": 26,
    "loc        "retained": "de",false,
            "tooShort": false,
            "namebreaking": "WP-665",{
                "no": 10021,
                "descbefore": "Fixes the splitting"
}

If no such configuration exists, the system returns an empty object:

Code Block
{}

The properties are:

...

id

...

The SRX configuration ID

...

int

...

loc

...

The locale of the configuration or null for language independent rules.

...

string?

...

name

...

The name of the rule or null if it is the default rule for the language.

...

string?

...

desc

...

An optional description

...

string

Example: The language independent default rule has no locale nor name:

Code Block
{
    "id": 6501,
    "loc": null,
    "name": null,
    "desc": null
}"[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?",
                "after": "\\s"
            },
            "exception": {
                "no": 10019,
                "before": "\\.+",
                "after": "[\\“\\\"\\'”\\)]?\\s\\p{Ll}"
            }
        },
        {
            "position": 36,
            "retained": true,
            "tooShort": false,
,
            "breaking": {
                "no": 10021,
                "before": "[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?",
                "after": "\\s"
            },
            "exception": {
                "no": null,
                "before": null,
                "after": null
            }
        }
    ],
    "parameters": {
        "locale": "de",
        "independentRuleId": 5503,
        "languageRulesId": 5502,
        "minimumSegmentLength": 5
    }
}

The properties are:

count

Total segments into which the text was split

int

original

The original text.

string?

segments

The list of segments with start character position and the text

object[]

rules

An array of breaking and exception rules that were activated for all the positions in the text. See below for details.

string

parameters

Includes information from the original payload.

object

The rules array contains positions in the text and describes whether the position was split (breaking rule) or undone by a specific exception rule. The properties are:

position

The text position that the system attempts to split

int

retained

true: The text was split in that position

false: The text was not split due to an exception rule

bool

tooShort

If the split segment is shorter than an allowed minimum, the split will be canceled. This property is then set to true.

bool

breaking

The breaking rule that was applied.

object

exception

The exception rule, if any, that canceled the breaking rule. If there is no exception then the properties will all be null.

object

Text segmentation works as follows:

  • IDENTIFY SPLIT POINTS: The process identifies all potential break points in the text from right to left. Once all break points are identified it looks to see if some of them need to be removed / canceled:

  • CHECK MINIMUM LENGTH: By looking at each segment from right to left, it identifies any segment that is shorter than the minimum length, listed in parameters.minimumSegmentLength . If so, then the split point to the left of the segment is removed. The tooShort property will be flagged for the canceled split point.

  • APPLY EXCEPTION RULES: If an exception rule is found in the SRX configuration that matches a remaining split point, then the split is also canceled. The details for the exception rule are listed in the exception property.

If you believe that this process is complicate then the Wordbee team heartily agrees with you.