I have real estate properties and their details (17 columns) in a CSV file (nearly half a million entries). One of the columns provides a location but is actually somewhat a bit too detailed. I want to categorize my entries so I want to simplify the location to give me more generic areas. I would have the areas I want to categorize the entries into in a list such as:
keywords = ['Downtown','Park View','Industrial District', ... ]
So ideally I would like to take an entry that has for example Sky Tower Downtown Los Angeles and then classify it as Downtown.
So the task is to first detect the keyword in the location column and then append it to a new column (right beside it if possible). If no keyword is found in the entry, I would to classify it as Other.
It would look something like this:
| Date | Record_Type | Location | Proterty_Type | ... | Price |
|---|---|---|---|---|---|
| 19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Apartment | ... | 15000 |
| 19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Apartment | ... | 17000 |
| 20-Mar-21 | Active Listing | Meadow Gardens, Park View | Villa | ... | 125000 |
To something like:
| Date | Record_Type | Location | Area | Proterty_Type | ... | Price |
|---|---|---|---|---|---|---|
| 19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Downtown | Apartment | ... | 15000 |
| 19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Other | Apartment | ... | 17000 |
| 20-Mar-21 | Active Listing | Meadow Gardens, Park View | Park View | Villa | ... | 125000 |
Finally it saves it all to a new csv file. I would also ideally like yo use pandas to read/write on the csv.
Thanks in advance!
Edit: I have tried methods such as the following threads, but I get errors and I don't know whats wrong, so Im open to fresh ideas.