OTMI Feature Set
From OpenTextMining
This page lists current and proposed features.
[edit]
Current Features
- Regex
- Unicode support (in generator script)
- Better sentence matching (sentence terminators, avoiding premature terminators)
- Add in sentence terminator punctuation
- Determine any stripped punctuation chars and reason for same
- Stopwords
- Determine what applied to (i.e. vectors, snbippets, etc.)
- Determine whether to apply stopwords
[edit]
Proposed Features
- Language (XmlLang)
