Consider the following dataframe in R:
  TYPE    VARIAVEL                           VALOR               
  A       OPER_RELAC_VARIAVEL1                100
  A       OPER_RELAC_VARIAVEL2                200
  A       OPER_RELAC_VARIAVEL3                300
  B       OPER_RELAC_VARIAVEL1                100
  B       OPER_RELAC_VARIAVEL2                200
  B       OPER_RELAC_VARIAVEL3                300
  A       CLI_RELAC_VARIAVEL1                 450
  A       CLI_RELAC_VARIAVEL2                 320
  A       CLI_RELAC_VARIAVEL3                 110
I want to take the relevance of each VALOR based on the root of VARIAVEL and TYPE. I don´t have a column with the root of the VARIAVEL, but it would be everyting before the second _ (OPER_RELAC and CLI_RELAC in this sample).
The expected result is:
  TYPE    VARIAVEL                           VALOR           RELEVANCE    
  A       OPER_RELAC_VARIAVEL1                100            0.167
  A       OPER_RELAC_VARIAVEL2                200            0.333
  A       OPER_RELAC_VARIAVEL3                300            0.500
  B       OPER_RELAC_VARIAVEL1                100            0.167
  B       OPER_RELAC_VARIAVEL2                200            0.333
  B       OPER_RELAC_VARIAVEL3                300            0.500
  A       CLI_RELAC_VARIAVEL1                 450            0.511
  A       CLI_RELAC_VARIAVEL2                 320            0.364
  A       CLI_RELAC_VARIAVEL3                 110            0.125
Since, for example, 450 represents 51.1% of the total for type A and root variable CLI_RELAC.
I have done with this sequence of commands:
1) Generate a column with the root variable using the library stringr
dados$VARIAVEL_MAE <- str_match(dados$VARIAVEL, "^([^_]+[_][^_]+)")[,2]
Thanks to R:how to get grep to return the match, rather than the whole string
2) Summarize in a new dataframe the totals aggregating by this fresh new column
TOTAIS <- aggregate(VALOR ~ Type + VARIAVEL_MAE, data = dados, sum)
names(TOTAIS) <- c('Type', 'VARIAVEL_MAE', 'TOTAL')
3) Merge this two dataframes using what is suggested here
dados <- merge(TOTAIS, dados, by = c('Type', 'VARIAVEL_MAE'))
dados$RELEVANCIA <- dados$VALOR / dados$TOTAL;
Is there a smarter way to do it or I do need all these steps?
My question is because in R, everything I do can always be replaced by something faster and smaller.
 
    