How to identify all the variation of a word in a column_one, and then fill a value in other column, , columns_two, whenever a variation of that word is found?
E.g. Fill column value with P, whenever a variation of "PHIADELPHIA" is found, and fill with I, whenever a variation of "ILLINOIS" if found.
| place | value |
|---|---|
| PHIADELPHIA | |
| PHIALDELPHIA | |
| PHIDELPHIA | |
| illinois | |
| PHIELADELPHIA | |
| PHIILADELPHIA | |
| illinoi | |
| PHILA | |
| PHILA. | |
| PHILAD | |
| PHILADALPHIA | |
| PHILADELPHIA | |
| PHILADELAPHIA | |
| PHILADELHIA | |
| PHILADELHPIA | |
| PHILADELLPHIA | |
| PHILADELPHIA | |
| PHILADELPH | |
| PHILADELPHA | |
| PHILADELPHAI | |
| PHILADELPHI | |
| PHILADELPHIA |
Fuzzy Matching, Levenshtein distance, etc
Input String:
import pandas as pd
import numpy as np
place = ['PHIADELPHIA','PHIALDELPHIA','PHIDELPHIA','illinois','PHIELADELPHIA','PHIILADELPHIA','illinoi','PHILA','PHILA.','PHILAD','PHILADALPHIA','PHILADELPHIA','PHILADELAPHIA','PHILADELHIA','PHILADELHPIA','PHILADELLPHIA','PHILADELPHIA','PHILADELPH','PHILADELPHA','PHILADELPHAI','PHILADELPHI','PHILADELPHIA']
value=[np.nan]*len(place)
df = pd.DataFrame(zip(place,value), columns=["place", "value"])
df