I am using Python 3.
I have a master dataframe " df " with the columns as shown (with 3 rows of sample data):
UNITID  CIPCODE AWLEVEL CTOTALT
100654  1.0999  5        9
100654  1.1001  5        10
100654  1.1001  7        6
I have a dataframe called " uni_names " as shown (with 3 rows of sample data):
UNITID    institution_name
100654    Alabama A & M University
100663    University of Alabama at Birmingham
100690    Amridge University
I have a dataframe called " cipcodes " as shown (with 3 rows of sample data):
cipcode_value   program_name
01.0000         Agriculture, General
01.0101         Agricultural Business and Management, General
01.0102         Agribusiness/Agricultural Business Operations
I have a dataframe called " awlevel " as shown (with 3 rows of sample data):
code    type    
3       Associate's degree
5       Bachelor's degree
7       Master's degree
What I want is an output dataframe with column names as such
institution_name    program_name    type    CTOTALT
My code below is giving duplicates and weird additional values:
import pandas as pd
# Read the master dataframe from a CSV file
df = pd.read_csv('master_data.csv')
# Read the uni_names dataframe from a CSV file
uni_names = pd.read_csv('uni_names.csv')
# Read the cipcodes dataframe from a CSV file
cipcodes = pd.read_csv('cipcodes.csv')
# Read the awlevel dataframe from a CSV file
awlevel = pd.read_csv('awlevel.csv')
# Merge df with uni_names based on UNITID
merged_df = df.merge(uni_names, on='UNITID')
# Merge merged_df with cipcodes based on CIPCODE
merged_df = merged_df.merge(cipcodes, left_on='CIPCODE', right_on='cipcode_value')
# Merge merged_df with awlevel based on AWLEVEL
merged_df = merged_df.merge(awlevel, left_on='AWLEVEL', right_on='code')
# Select the desired columns and assign new column names
output_df = merged_df[['institution_name', 'program_name', 'type', 'CTOTALT']]
output_df.columns = ['institution_name', 'program_name', 'type', 'CTOTALT']
# Print the output dataframe
print(output_df)
 
    