Going on two months in python and I am focusing hard on Pandas right now. In my current position I use VBA on data frames, so learning this to slowly replace it and further my career. As of now I believe my true problem is the lack of understanding a key concept(s). Any help would be greatly appreciated.
That said here is my problem:
Where could I go to learn more on how to do stuff like this for more precise filtering. I'm very close but there is one key aspect I need.
Goal(s)
Main goal I need to skip certain values in my ID column. The below code takes out the Dashes "-" and only reads up to 9 digits. Yet, I need to skip certain IDs because they are unique.
After that I'll start to work on comparing multiple sheets.
- Main data frame IDs is formatted as 000-000-000-000
- The other data frames that I will compare it to have it with no dashes "-" as 000000000 and three less 000's totaling nine digits.
The unique IDs that I need skipped are the same in both data frames, but are formatted completely different ranging from 000-000-000_#12, 000-000-000_35, or 000-000-000_z.
My code that I will use on each ID except the unique ones:
 dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9]
but I want to use an if statement like (This does not work)
lst = ["000-000-000_#69B", "000-000-000_a", "etc.. random IDs", ]
if ~dfSS["ID"].isin(lst ).any()
    dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9]
else:
    pass
For more clarification my input DataFrame is this:
            ID               Street #   Street Name 
0   004-330-002-000         2272        Narnia  
1   021-521-410-000_128     2311        Narnia  
2   001-243-313-000         2235        Narnia  
3   002-730-032-000         2149        Narnia
4   000-000-000_a           1234        Narnia
And I am looking to do this as the output:
            ID               Street #   Street Name 
0   004330002               2272        Narnia  
1   021-521-410-000_128     2311        Narnia  
2   001243313000            2235        Narnia  
3   002730032000            2149        Narnia
4   000-000-000_a           1234        Narnia
Notes:
- dfSS is my Dataframe variable name aka the excel I am using. "ID" is my column heading. Will make this an index after the fact
- My Data frame on this job is small with # of (rows, columns) as (2500, 125)
- I do not get an error message so I am guessing maybe I need a loop of some kind. Starting to test for loops with this as well. no luck there... yet.
Here is where I have been to research this:
- Comparison of a Dataframe column values with a list
- How to filter Pandas dataframe using 'in' and 'not in' like in SQL
- if statement with ~isin() in pandas
- recordlinkage module-I didn't think this was going to work
- Regular expression operations - Having a hard time fully understanding this at the moment
 
    