Treasure Data's primary idea portal.
Submit your ideas & feature requests directly to our product requirements team! We look forward to hearing from you.
This is a customer's request.
The customer can do morphological analysis using tokenize_ja, but current behavior is not suitable for the customer (e.g. 「二番目」is separated 「二」and「番」,「目」)So it's better if we can set customize dictionary optionally.
Custom dictionary is implemented.http://hivemall.incubator.apache.org/userguide/misc/tokenizer.htmlhttps://docs.treasuredata.com/articles/releasenote-20171001#machine-learning-hivemall-v042-rc3-new-hive-udfs
You won't be notified about changes to this idea.