Notes:
If one wants to do it with Pandas dataframes, there are various ways to achieve that.
One approach would creating a dataframe to store the dataframes that have the highest number of rows per sample_id, and then apply a custom made function. Let's invest a bit more time in that approach.
We will first create a dataframe to store the number of rows that each dataframe has per sample_id as follows
df_count = pd.DataFrame({'sample_id': df_a['sample_id'].unique()})
df_count['df_a'] = df_count['sample_id'].map(df_a.groupby('sample_id').size())
df_count['df_b'] = df_count['sample_id'].map(df_b.groupby('sample_id').size())
As it will be helpful, let us create a column df_max that will store the dataframe that has more rows per sample_id
df_count['df_max'] = df_count[['df_a', 'df_b']].idxmax(axis=1)
[Out]:
   sample_id  df_a  df_b df_max
0          1     2     1   df_a
1          2     4     2   df_a
2          3     1     3   df_b
3          4     1     1   df_a
4          5     6     3   df_a
A one-liner to create the desired df_count would look like the following
df_count = pd.DataFrame({'sample_id': df_a['sample_id'].unique()}).assign(df_a=lambda x: x['sample_id'].map(df_a.groupby('sample_id').size()), df_b=lambda x: x['sample_id'].map(df_b.groupby('sample_id').size()), df_max=lambda x: x[['df_a', 'df_b']].idxmax(axis=1))
Now, given df_a, df_b, and df_count, one will want a function to merge the dataframes based on a specific condition:
- If - df_maxis- df_a, then take the rows from- df_a.
 
- If - df_maxis- df_b, then take the rows from- df_b.
 
One can create a function merge_df that takes df_a, df_b, and df_count and returns the merged dataframe
def merge_df(df_a, df_b, df_count):
    # Create a list to store the dataframes
    df_list = []
    # Iterate over the rows in df_count
    for index, row in df_count.iterrows():
        # If df_max is df_a, then take the rows from df_a
        if row['df_max'] == 'df_a':
            df_list.append(df_a[df_a['sample_id'] == row['sample_id']])
        # If df_max is df_b, then take the rows from df_b
        elif row['df_max'] == 'df_b':
            df_list.append(df_b[df_b['sample_id'] == row['sample_id']])
        # If df_max is neither df_a nor df_b, then use the first dataframe
        else:
            df_list.append(df_a[df_a['sample_id'] == row['sample_id']])
    # Concatenate the dataframes in df_list and return the result. Also, reset the index.
    return pd.concat(df_list).reset_index(drop=True)
Then one can apply the function
df_merged = merge_df(df_a, df_b, df_count)
[Out]:
    sample_id
0           1
1           1
2           2
3           2
4           2
5           2
6           3
7           3
8           3
9           4
10          5
11          5
12          5
13          5
14          5
15          5