I have extracted a series of tables from the scientific literature which consist of columns each of which is a distinct type. Here is an example
I'd like to be able to automatically generate regular expressions for each column. Obviously there are trivial solutions such as .* so I would add the constraints that they use only:
- [A-Z] [a-z] [0-9]
- explicit punctuation (e.g. ',',''')
- "simple" quantifiers (e.g {3,4}
A "best" answer for the table above would be:
 [A-Z]{3}
 [A-Za-z\s\.]+
 \d{4}\sm
 \d{2}\u00b0\d{2}'\d{2}"N,\d{2}\u00b0\d{2}'\d{2}"E
 (speciosissima|intermediate|troglodytes)
 (hf|sr)
 \d{4}
Of course the 4th regex would break if we move outside the geographical area but the software doesn't know that. The aim would be to collect many regexes for , say "Coordinates" and generalize them, probably partially manual. The enums would only be created if there were a small number of distinct strings.
I'd be grateful for examples of (especially F/OSS) software that can do this, especially in Java. (It's similar to Google's Refine). I am aware of this question 4 years ago but that didn't really answer the question and the text2re site which appears to be interactive.
NOTE: I note a vote to close as "too localised". This is a very general problem (the table given is only an example) as shown by Google/Freebase developing Refine to tackle the problem. It potentially refers to a very wide variety of tables (e.g. financial, journalism, etc.). Here's one with floating point values:

It would be useful to determine automatically that some authorities report ages in real numbers (e.g. not months, days) and use 2 digits of precision.
 
     
     
     
    