How to replace NULL values with Mean value of a category in SQL?

Question

I have a dataset with null values in the column 'revenues_from_appointment'

Dataset

appointment_date	patient_id	practitioner_id	appointment_duration_min	revenues_from_appointment
2021-06-28	42734	748	30	90.0
2021-06-29	42737	747	60	150.0
2021-07-01	42737	747	60	NaN
2021-07-03	42736	748	30	60.0
2021-07-03	42735	747	15	42.62
2021-07-04	42734	748	30	NaN
2021-07-05	42734	748	30	100.0
2021-07-10	42738	747	15	50.72
2021-08-12	42739	748	30	73.43

I wish to replace NULL values by the mean value of rows where "patient_id, practitioner_id, appointment_duration_min" is the same.

I did it using pandas dataframe,

df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)

How can we obtain the same result by using SQL?

Final Output

appointment_date	patient_id	practitioner_id	appointment_duration_min	revenues_from_appointment
2021-06-28	42734	748	30	90.0
2021-06-29	42737	747	60	150.0
2021-07-01	42737	747	60	150.0
2021-07-03	42736	748	30	60.0
2021-07-03	42735	747	15	42.62
2021-07-04	42734	748	30	95.0
2021-07-05	42734	748	30	100.0
2021-07-10	42738	747	15	50.72
2021-08-12	42739	748	30	73.43

mysql and sql server are two different database products with different sql syntax. Which one do you use? — Shadow, Jun 01 '22 at 10:05
I had a similar issue, used python to replace the NULL values as shown in this link. https://stackoverflow.com/questions/18689823/pandas-dataframe-replace-nan-values-with-average-of-columns — JonTout, Jun 01 '22 at 10:37

score 0 · Accepted Answer · answered Jun 01 '22 at 10:23

You can use the AVG window function, that will partition on the three column of interest and replace null values using the COALESCE function:

SELECT appointment_date,
       patient_id,
       practitioner_id,
       appointment_duration_min,
       COALESCE(revenues_from_appointment, 
                AVG(revenues_from_appointment) OVER(PARTITION BY patient_id, 
                                                                 practitioner_id, 
                                                                 appointment_duration_min))
FROM tab

Try it here.

How to replace NULL values with Mean value of a category in SQL?

Final Output

1 Answers1