OTMI Feature Set

From OpenTextMining

Jump to: navigation, search

This page lists current and proposed features.

Current Features

  • Regex
    • Unicode support (in generator script)
    • Better sentence matching (sentence terminators, avoiding premature terminators)
    • Add in sentence terminator punctuation
    • Determine any stripped punctuation chars and reason for same
  • Stopwords
    • Determine what applied to (i.e. vectors, snbippets, etc.)
    • Determine whether to apply stopwords

Proposed Features

  • Language (XmlLang)
Personal tools