AIbase
Product LibraryTool Navigation

python-ucto

Public

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Creat2014-05-22T01:28:45
Update2024-12-17T19:55:38
29
Stars
0
Stars Increase