I am trying to get the ECDF for all items similar (in the whole data table) to the item number in each row, and add the ECDF column to the end of the data table (EstimatePrediction).
This works for individual items, so they can be checked one by one.
    #Set Current ItemNumber
    currentItemNumber = “XXXXX”
     #Set Estimate Days
    currentEstimate = 5
    #Gets the index of the ItemNumber from the Matches table
    itemNoIndex = ((matches%>%subset(Item_No ==itemNumber))$ItemIndex[1])
    #Gets all the matching indexs that equal the index and select data
    matchingItems = matches%>%filter(ItemIndex == itemNoIndex) %>%
                             filter(MatchItemIndex != ItemIndex) %>%
                             merge(data.filter %>%
                             select(ITEM_NO,ACTUAL_DAYS),by = 'ITEM_NO')
    #Get the ECDF of all matching items at the estimate
    ecdf(matchingItems $ACTUAL_DAYS)( currentEstimate )
I am trying to take the above R code and modify to work for the whole data.filter data table. The problem is it only works for the first row in data.filter data. The rows after the first are based off the first row’s data, not their own.
EstimatePrediction = data.filter %>% mutate(PROBABILITY_PREDICTION = ecdf((matches%>%subset(ItemIndex == ((matches%>%subset(Item_No== ITEM_NO))$ItemIndex[1])) %>%
subset(MatchItemIndex != ItemIndex) %>%
merge(data.filter, by = 'ITEM_NO'))$ACTUAL_DAYS)(ESTIMATE_DAYS) )
I am very new to R so I am open to any suggestions. I can get the correct output by iterating through the data.filter, but it is extremely slow.
Sample Data
    Matches
 MatchItemIndex ItemIndex MatchItemOrder  Item_No Count Cumulative
           <int>     <int>          <int>   <chr> <int>      <int>
1              1         1              1 CBL233J    14         14
2              2         2              1 CGW112N     4          4
3              3         3              1 CAT418D     5          5
4              4         4              1 BRH131T    29         29
5              5         5              1 CQD390A    17         17
6              6         6              1 CEE533J    11         11
    data.filter
   ITEM_NO ESTIMATE_DAYS ACTUAL_DAYS
1: CBL233J            10           6
2: CGW112N            22          12
3: CAT418D            22          18
4: BRH131T            33          16
5: CQD390A            21          15
6: CEE533J             7           2
EDIT**** I am now able to get the output I need its just really slow:
data.filter = data.filter%>%mutate(Index = 1:n())
loopData = data.filter%>%select(ITEM_NO, ACTUAL_DAYS, ESTIMATE_DAYS, Index)
simpleV = unlist(loopData)
outputTest = 1:nrow(loopData)
ptm <- proc.time()
for(i in 1:nrow(loopData)){
  #Get Index for Item Number
  itemNoIndex = (matches%>%subset(ITEM_NO == simpleV[paste('ITEM_NO',i,sep="")]))$ItemIndex[1]
  #Find all the matches that have the same index 
  allNNItemData = matches%>%subset(ItemIndex == itemNoIndex) %>%
    subset(MatchItemIndex != ItemIndex) %>%
    merge(data.filter, by = 'ITEM_NO')
  outputTest[i] = ecdf(allNNItemData$ACTUAL_DAYS)(simpleV[paste('ESTIMATE_DAYS',i,sep="")])
} 
proc.time() - ptm
 
    