I have a data frame and I want to create a new column prob using dplyr's mutate() function. prob should include the probability P(row value > all column values) that there are rows of greater value in the data frame than each row value. Here is what I want to do:
data = data.frame(value = c(1,2,3,3,4,4,4,5,5,6,7,8,8,8,8,8,9))
require(dplyr)
data %>% mutate(prob = sum(value < data$value) / nrow(data))
This gives the following results:
   value prob
1      1    0
2      2    0
3      3    0
4      3    0
...    ...  ...
Here prob only contains 0 for each row. If I replace value with 2 in the expression sum(value < data$value):
data %>% mutate(prob = sum(2 < data$value) / nrow(data))
I get the following results:
   value      prob
1      1 0.8823529
2      2 0.8823529
3      3 0.8823529
4      3 0.8823529
...    ...  ...
0.8823529 is the probability that there are rows of greater value than 2 in the data frame. The problem seems to be that the mutate() function doesn't accept the value column as a parameter inside the sum() function.