settings/srx/tools/split

This tool permits to submit a short text and see how it is split. The API returns very detailed information so that it is easier to understand which breaking rules and exception rules were invoked. This tool aids with debugging or building SRX configurations.

URL

(POST) /api/settings/srx/tools/split

PARAMETERS

The BODY must include a JSON object with these properties:

locale	The language of the text, such as “en”, “de” etc.	string, Mandatory
text	The text to be tested. Up to 1000 characters.	string, Mandatory
independentRulesId	The language independent SRX rules. If you set to null, then no language independent rules will be loaded (not recommended). Use settings/srx/find to find configurations.	int, Optional
languageRulesId	The language specific SRX rule. It must match the locale of the text. If you set to null, then no language specific rules will be loaded (not recommended). Use settings/srx/find to find configurations.	int, Optional

Example payload:

{  
    "locale": "de",
    "independentRulesId": 234,
    "languageRulesId" : 213,
    "text": "Hallo geht es am 20.4. um 3 Uhr? Geht es spaeter?"
}

RESULTS

The JSON result shows segmentation results. A result for the sample above might be:

{
    "count": 2,
    "original": "Hallo wie geht es am 20.4. um 3 Uhr. Geht es spaeter?",
    "segments": [
        {
            "position": 0,
            "text": "Hallo wie geht es am 20.4. um 3 Uhr."
        },
        {
            "position": 36,
            "text": " Geht es spaeter?"
        }
    ],
    "rules": [
        {
            "position": 26,
            "retained": false,
            "tooShort": false,
            "breaking": {
                "no": 10021,
                "before": "[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?",
                "after": "\\s"
            },
            "exception": {
                "no": 10019,
                "before": "\\.+",
                "after": "[\\“\\\"\\'”\\)]?\\s\\p{Ll}"
            }
        },
        {
            "position": 36,
            "retained": true,
            "tooShort": false,
,
            "breaking": {
                "no": 10021,
                "before": "[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?",
                "after": "\\s"
            },
            "exception": {
                "no": null,
                "before": null,
                "after": null
            }
        }
    ],
    "parameters": {
        "locale": "de",
        "independentRuleId": 5503,
        "languageRulesId": 5502,
        "minimumSegmentLength": 5
    }
}

The properties are:

count	Total segments into which the text was split	int
original	The original text.	string?
segments	The list of segments with start character position and the text	object[]
rules	An array of breaking and exception rules that were activated for all the positions in the text. See below for details.	string
parameters	Includes information from the original payload.	object

The rules array contains positions in the text and describes whether the position was split (breaking rule) or undone by a specific exception rule. The properties are:

position	The text position that the system attempts to split	int
retained	`true`: The text was split in that position `false`: The text was not split due to an exception rule	bool
tooShort	If the split segment is shorter than an allowed minimum, the split will be canceled. This property is then set to true.	bool
breaking	The breaking rule that was applied.	object
exception	The exception rule, if any, that canceled the breaking rule. If there is no exception then the properties will all be null.	object

Text segmentation works as follows:

Find a breaking rule. This is where the segmenter would “like” to split the text
If the resulting segment is shorter than the minimum length in parameters.minimumSegmentLength then the split will be canceled (and the tooShort property is set).
If an exception rule is found in the SRX configuration that matches the split point, then the split is also canceled. The details for the exception rule are listed in the exception property.
Start over to find more breaking rules.