My company is looking to create a PivotViewer visualization of a client's Wordpress 2 blog posts for the last 11 years. To do so, however, we need to edit the somewhat haphazard, incomplete, and generally poor tags for use as sortable categories. I'm looking for a tool that will analyze their blog entries and perform word counting, to give us a sense of what we're dealing with.
Ideally, it would have all of these features:
- Word blacklisting (ignore)
- Word stemming
- Custom synonym merging
- Counting all uses
- Counting number of posts a word appears in.
I would have thought that this sort of textual analysis would be extremely common, but I haven't been able to find any software that does this sort of thing on entire blogs. Is there software available to do this?