I have a data frame that I want to run some statistical tests on. However, I want to group the data based on one of the columns first.
Here's an example data frame:
CATEGORY   ITEM     SHOP1 STOCK   SHOP2 STOCK
 Fruit    Orange         5             9
 Fruit    Apple         12            32
 Fruit     Pear         17             6
  Veg    Carrots        59            72
  Veg    Potatoes        6            57
  Veg   Courgette       43            22
  Veg    Parsnips        5             9
  ...      ...         ...           ...
So for this example, I want to look at the chi squared distribution but across categories - so I want to reduce the data to a table like this:
          SHOP1 SHOP2
   FRUIT    34    47
     VEG   113   160
Where the table shows the sum of the stock for each category for each shop (this is a very simplified version - the data that I have runs to 37 categories over a few hundred rows), and no longer specifies the item, just the category.
So I thought I could group_by(CATEGORY) and then run the chi squared test on the grouped data, but that doesn't seem to work. I think I need to add up the two columns with numbers in, but I don't know how to do that in conjunction with the chi squared testing. I've been faffing with this for some time now with no luck, so I'd really appreciate your help!
 
     
     
    