This is the problem:
- I have two columns in my matadata database "field name" and "field description"
 - I need to check if the "field description" is actually a description and not some sort of transformation of the "field name"
 - [Edit] I need to avoid preprocessing the text to remove separators, as I would have to consider a long list of cases (e.g. _-;$%/^| etc.)
 
Examples:
| row | field_name | field_description | 
|---|---|---|
| 1 | my_first_field | my first field | 
| 2 | my_second_field | my------second------field | 
| 3 | my_third_field | this is a description about the field, the descriprion can contain the name of the field itself | 
Where the examples 1st and 2nd are similars (thus wrong) and the 3rd is correct.
I have tried some implementations based on Leveinshtein Distance, difflib, Cosine Similarity and an implementation called spaCy but none of them was robust with my examples (throwing only around 50% of similarity rate with the 1st example).
Some of the implementations I tried to use:
- https://towardsdatascience.com/surprisingly-effective-way-to-name-matching-in-python-1a67328e670e
 - https://spacy.io/usage/linguistic-features#vectors-similarity
 - https://docs.python.org/3/library/difflib.html
 - is there a way to check similarity between two full sentences in python?
 
[Edit]
I have just tried the implementation of HuggingFace semantic-textual-similarity with nice results.
| field_name | field_description | Score | 
|---|---|---|
| my_field_name | my_field_name | 1.0000 | 
| second_field_name | second field name | 0.8483 | 
| third_field_name | third-field-name | 0.8717 | 
| fourth_field_name | this is a correct description field | 0.4591 | 
| fifth_field_name | fifth_-------field_//////////////name | 0.8454 |