Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Text segmentation works as follows:

  • Find a breaking rule. This is where the segmenter would “like” to split the text

  • If the resulting segment IDENTIFY SPLIT POINTS: The process identifies all potential break points in the text from right to left. Once all break points are identified it looks to see if some of them need to be removed / canceled:

  • CHECK MINIMUM LENGTH: By looking at each segment from right to left, it identifies any segment that is shorter than the minimum length, listed in parameters.minimumSegmentLength . If so, then the split point to the left of the segment is removed. The tooShort property will be canceled (and the tooShort property is set).flagged for the canceled split point.

  • APPLY EXCEPTION RULES: If an exception rule is found in the SRX configuration that matches the a remaining split point, then the split is also canceled. The details for the exception rule are listed in the exception property.Start over to find more breaking rules

If you believe that this process is complicate then the Wordbee team heartily agrees with you.