I have data like that:
object category country
495647 1        RUS  
477462 2        GER  
431567 3        USA  
449136 1        RUS  
367260 1        USA  
495649 1        RUS  
477461 2        GER  
431562 3        USA  
449133 2        RUS  
367264 2        USA  
...
where one object appears in various (category, country) pairs and countries share a single list of categories.
I'd like to add another column to that, which would be a category weight per country - the number of objects appearing in a category for a category, normalized to sum up to 1 within a country (summation only over unique (category, country) pairs).
I could do something like:
aggregate(df$object, list(df$category, df$country), length)
and then calculate the weight from there, but what's a more efficient and elegant way of doing that directly on the original data.
Desired example output:
object category country weight
495647 1        RUS     .75
477462 2        GER     .5 
431567 3        USA     .5 
449136 1        RUS     .75
367260 1        USA     .25
495649 1        RUS     .75
477461 3        GER     .5
431562 3        USA     .5
449133 2        RUS     .25
367264 2        USA     .25
...
The above would sum up to one within country for unique (category, country) pairs.
 
     
     
     
    