I have the following dataframes:
df1:
| ZIP code | Other columns |
|---|---|
| 1011AA | ... |
| 1011AA | ... |
| 2316XH | ... |
| 5815NE | ... |
df2:
| starting value ZIP code range | last value ZIP code range | Province |
|---|---|---|
| 1000 | 1200 | North-Holland |
| 1201 | 1500 | South-Holland |
| 1501 | 1570 | North-Holland |
| 1571 | 1600 | Den Haag |
I want to:
- Get the first four digits of
df1["ZIP code"] - Check if these four digits are present in any range in
df2["starting value ZIP code range"]anddf["last value ZIP code range"] - If there is a match, get
df2["Province"]and add this value to a column indf1.
The difficulty is that I need to compare this to a range of values and I can only use the first 4 digits of the string. Most examples I found on stackoverflow compare based on a single value. The desired result is:
| ZIP code | New column |
|---|---|
| 1011AA | North-Holland |
| 1011AA | North-Holland |
| 2316XH | Haarlem |
| 5815NE | Utrecht |
Bonus points if you can do it using map. For example, df1["New column"] = df1["ZIP code"].str[:4].map(... ? ...). However, if the map method is a bad idea please suggest a better method.