1

I wish to create a concepts map from unstructured text. For example

Desired input: find "/" -name "*.txt"
Desired output: concepts-graph.dot

In other words, I want to mine my text files and create some kind of structured representation of key words/concepts. Loosely a poor-man's Google text analyser.

Is there an open source tool/API that can find relationships between terms in a plaintext file?

Hennes
  • 65,804
  • 7
  • 115
  • 169

1 Answers1

1

There are many tools you could build with:

As far as key words go, there are basic tools, like Porter stemmers, available in most programming languages, and lots more options for specific languages.

For example, there's NLTK (natural language toolkit) - a Python text classification system - which you can use for things like part-of-speech tagging (http://nltk.org/)

Also, there are various text mining packages you can use within R: http://tm.r-forge.r-project.org/, for example (also see these slides: http://www.zinkov.com/posts/2010-10-21-slides_from_larug/tm_slides.pdf).

If you can provide a clearer idea of the sort of text analysis you have in mind it would be easier to suggest specific packages that might be relevant?

Soz
  • 1,217