unitowns = get_list_of_university_towns()
bottom = get_recession_bottom()
start = get_recession_start()
hdata = convert_housing_data_to_quarters()
bstart = hdata.columns[hdata.columns.get_loc(start) - 1]
hdata= hdata[[bstart,bottom]]
hdata['Ratio']=hdata[bstart]/ hdata[bottom]
hdata=hdata.reset_index()
combined = pd.merge (hdata, unitowns, how='inner', on=['State','RegionName'])
 
 
The following cleaning needs to be done:
- For "State", removing characters from "[" to the end.
- For "RegionName", when applicable, removing every character from " (" to the end.
- Depending on how you read the data, you may need to remove newline character '\n'. '''
code for getting unitown values
   df = pd.read_csv('university_towns.txt', delimiter = '\t', header= 
   None).rename(columns={0:'Data'}) 
   boolian_df = df['Data'].str.contains('[edit]', regex= False) 
   state_university= []
   for index, value in boolian_df.items():
      if value:
         state = df.loc[index].values[0]
      else:
         region = df.loc[index].values[0]
         state_university.append([state,region])
   final_dataframe = pd.DataFrame(state_university, columns=['State', 
   'RegionName'])
   final_dataframe['State'] = final_dataframe['State'].str.replace('\ 
   [edit.*','') 
   final_dataframe['RegionName'] = 
   final_dataframe['RegionName'].str.replace('\ 
   (.*', '')
   final_dataframe['RegionName'] = 
   final_dataframe['RegionName'].str.replace('University.*,', '')
   return final_dataframe  
output unitowns.head()
      State      RegionName  Type
      Alabama        Auburn   Uni
      Alabama      Florence   Uni
      Alabama  Jacksonville   Uni
      Alabama    Livingston   Uni
      Alabama    Montevallo   Uni
output hdata.head()
          State      RegionName         2008q1         2009q2     Ratio
          New York      New York     508500.000000  465833.333333  1.091592
          California   Los Angeles   535300.000000  413900.000000  1.293308
          Illinois       Chicago     243733.333333  219700.000000  1.109392
          Pennsylvania  Philadelphia 119566.666667  116166.666667  1.029268
          Arizona       Phoenix  218633.333333  168233.333333  1.299584
Both dataframes have same column names.
It gives Empty DataFrame Columns: [State, RegionName, 2008q1, 2009q2, Ratio] Index: []
 
     
    