Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.