I can't seem to find the right phrasing for Googling this question because I get closely similar but not correct answers.
I'm busy with the Titanic dataset and want to sum the number of surviving members in a family. So the dataset looks like this:
+-------------+----------+-----------+-------------+
| PassengerId | Survived |  Surname  | NumSurvived |
+-------------+----------+-----------+-------------+
|           1 |        0 | Braund    |             |
|           2 |        1 | Cumings   |             |
|           3 |        1 | Heikkinen |             |
|           4 |        1 | Futrelle  |             |
|           5 |        0 | Braund    |             |
|           6 |        0 | Moran     |             |
|           7 |        0 | Futrelle  |             |
|           8 |        0 | Braund    |             |
|           9 |        1 | Cumings   |             |
+-------------+----------+-----------+-------------+
I need to sum the Survived value for each surname in the NumSurvived column like so:
+-------------+----------+-----------+-------------+
| PassengerId | Survived |  Surname  | NumSurvived |
+-------------+----------+-----------+-------------+
|           1 |        0 | Braund    |     0       |
|           2 |        1 | Cumings   |     2       |
|           3 |        1 | Heikkinen |     1       |
|           4 |        1 | Futrelle  |     1       |
|           5 |        0 | Braund    |     0       |
|           6 |        0 | Moran     |     0       |
|           7 |        0 | Futrelle  |     1       |
|           8 |        0 | Braund    |     0       |
|           9 |        1 | Cumings   |     2       |
+-------------+----------+-----------+-------------+
 
    