I have a csv file that has three columns, one called (Age_Groups), one called (Trip_in_min) and the third is called (Start_Station_Name), (actually it comes from a bigger dataset (17 rows and 16845 columns)
Now I need to get the average trip time per age group
Here is the link to the csv file, in dropbox, as I did not know how to paste it properly here
Any help please?
import pandas as pd
file = pd.read_csv(r"file.csv")
# Counting total minutes per age group
trips_summary = (file.Age_Groups.value_counts())
print(("Number of trips per age group"))
print(trips_summary)# per age group
print()
# Finding the most popular 20 stations
popular_stations = (file.Start_Station_Name.value_counts())
print("The most popular 20 stations")
print(popular_stations[:20])
print()
UPDATE
Ok, it worked, I added the line
df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()
Thanks @jjj, however as I mentioned, my data has more than 16K row, once I added back the rows, it started to fail and gives me the error below (might be not a real error), with only age groups and not average printed, I can get it only if I have 1890 rows or less, here is the message I am getting for larger number of rows (BTW), other operations work fine with the full DS, just this one):
*D:\Test 1.py:18: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. avg = df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()
Age_Groups* 0 18-24 1 25-34 2 35-44 3 45-54 4 55-64 5 65-74 6 75+
UPDATE 2
Not all columns are numbers, however when I use the code below:
df.apply(pd.to_numeric, errors='ignore').info()
I get the below output(my target is number 12)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1897 entries, 1 to 1897
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Riverview Park      11 non-null     object 
 1   Riverview Park.1    11 non-null     object 
 2   Riverview Park.2    11 non-null     object 
 3   Start_Station_Name  1897 non-null   object 
 4   3251                98 non-null     float64
 5   Jersey & 3rd        98 non-null     object 
 6   24443               98 non-null     float64
 7   Subscriber          98 non-null     object 
 8   1928                98 non-null     float64
 9   Unnamed: 9          79 non-null     float64
 10  Age_Groups          1897 non-null   object 
 11  136                 98 non-null     float64
 12  Trip_in_min         1897 non-null   object 
dtypes: float64(5), object(8)
memory usage: 192.8+ KB
 
    