Java Reference
In-Depth Information
Comparing tokenizers
A brief comparison of the NLP APIs tokenizers is shown in the following table. The tokens
generated are listed under the tokenizer's name. They are based on the same text, "Let's
pause, \nand then reflect.". Keep in mind that the output is based on a simple use of the
classes. There may be options not included in the examples that will influence how the
tokens are generated. The intent is to simply show the type of output that can be expected
based on the sample code and data.
Document
IndoEuropean
SimpleTokenizer
WhitespaceTokenizer
TokenizerME
PTBTokenizer
Preprocessor
TokenizerFactory
Let
Let's
Let
Let
Let
Let
'
pause,
's
's
's
'
s
and
pause
pause
pause
s
pause
then
,
,
,
pause
,
reflect.
and
and
and
,
and
then
then
then
and
then
reflect
reflect
reflect
then
reflect
.
.
.
reflect
.
.
Search WWH ::




Custom Search