...
locale | The language of the text, such as “en”, “de” etc. | string, Mandatory |
text | The text to be tested. Up to 1000 characters. | string, Mandatory |
independentRulesId | The language independent SRX rulerules. If you set to null, then no language independent rules will be loaded (not recommended). Use settings/srx/find to find configurations. | int, Optional |
languageRulesId | The language specific SRX rule. It must match the locale of the text. Use settings/srx/find to find configurations. If you set to null, then no language specific rules will be loaded (not recommended). Use settings/srx/find to find configurations. | int, Optional |
Example payload:
Code Block |
---|
{
"locale": "de",
"independentRulesId": 234,
"languageRulesId" : 213,
"text": "Hallo geht es am 20.4. um 3 Uhr? Geht es spaeter?"
} |
RESULTS
The JSON result shows the SRX configuration propertiessegmentation results. A result for the sample above might be:
Code Block |
---|
{ "count": 2, "id": 6501, "loc": "de" "original": "Hallo wie geht es am 20.4. um 3 Uhr. Geht es spaeter?", "segments": [ { "position": 0, "text": "Hallo wie geht es am 20.4. um 3 Uhr." }, { "position": 36, "text": " Geht es spaeter?" } ], "rules": [ { "position": 26, "retained": false, "name "breaking": { "WP-665", "no": 10021, "desc": "Fixes the splitting" } |
If no such configuration exists, the system returns an empty object:
Code Block |
---|
{} |
The properties are:
...
id
...
The SRX configuration ID
...
int
...
loc
...
The locale of the configuration or null for language independent rules.
...
string?
...
name
...
The name of the rule or null if it is the default rule for the language.
...
string?
...
desc
...
An optional description
...
string
Example: The language independent default rule has no locale nor name:
Code Block |
---|
{ "id": 6501, "loc": null, "name": null, "desc": null } "before": "[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?", "after": "\\s" }, "exception": { "no": 10019, "before": "\\.+", "after": "[\\“\\\"\\'”\\)]?\\s\\p{Ll}" } }, { "position": 36, "retained": true, "breaking": { "no": 10021, "before": "[\\.\\?\\!\\;\\:]+[\\“\\\"\\'”\\)]?", "after": "\\s" }, "exception": { "no": null, "before": null, "after": null } } ], "parameters": { "locale": "de", "independentRuleId": 5503, "languageRulesId": 5502 } } |
The properties are:
count | Total segments into which the text was split | int |
original | The original text. | string? |
segments | The list of segments with start character position and the text | object[] |
rules | An array of breaking and exception rules that were activated for all the positions in the text. See below for details. | string |
parameters | Includes information from the original payload. | object |
The rules
array contains positions in the text and describes whether the position was split (breaking rule) or undone by a specific exception rule. The properties are:
position | The text position that the system attempts to split | int |
retained |
| bool |
breaking | The breaking rule that was applied. | object |
exception | The exception rule, if any, that canceled the breaking rule. If there is no exception then the properties will all be null. | object |