General Aims of OTMI

From OpenTextMining

Jump to: navigation, search

The general aim of OTMI is to enable text mining analysis without issuing human-readable text. This allows publishers to support text-mining-based research within their existing business models. In practice this is achieved in two principal ways: using word vectors (i.e. word occurrences with frequency counts) and 'snippets' (sentences and phrases from the text presented out of order).

Other considerations:

  1. Remove markup
  2. Remove entities/macros
  3. Self-contained (with metadata and links back to original content), so OTMI documents make sense outside of their original context.
  4. Retain section-level structure of document
  5. Separate figure legends from main text
Personal tools