We need to fill in a classification data table. I tend to write for loops a little too much, I'm trying to figure out how to do it with apply(). I'm scanning the last column to find a non-missing value, then filing in each column with the value above it, only on a diagonal. So if there are 3 columns, this would fill in the values for the last column. I'd repeat it for each 'higher taxonomic level' or the next column to the left:
# fills in for Family-level taxonomy
for(i in nrows(DataFrame)){  
  if(is.na(DataFrame[[4]][i])) next
    else {
      DataFrame[[3]][i] <- DataFrame[[3]][i-1]
      DataFrame[[2]][i] <- DataFrame[[2]][i-2]
      DataFrame[[1]][i] <- DataFrame[[1]][i-3]
     }
}
# Repeat to fill in Order's higher taxonomy (Phylum and Class)
for(i in nrows(DataFrame)){  # fills in for Family
  if(is.na(DataFrame[[3]][i])) next
    else {
      DataFrame[[2]][i] <- DataFrame[[2]][i-2]
      DataFrame[[1]][i] <- DataFrame[[1]][i-3]
     }
}
# And again for each column to the left.
the data may look like:
Phylum     Class       Order        Family  
Annelida   
           Polychaeta  
                       Eunicida
                                    Oenoidae
                                    Onuphidae     
                       Oweniida
                                    Oweniidae
This will then repeat for each unique Family in that Order, and each Unique Order in Class, and each Unique Class in Phylum. Essentially, we need to fill in the values to the left of each non-missing value, from the next non-missing value above it. So the end result would be:
Phylum     Class       Order    Family  
Annelida   
Annelida  Polychaeta  
Annelida  Polychaeta  Eunicida
Annelida  Polychaeta  Eunicida Oenoidae
Annelida  Polychaeta  Eunicida Onuphidae     
Annelida  Polychaeta  Oweniida
Annelida  Polychaeta  Oweniida Oweniidae
We can't just copy down the columns since once we get to new phylum level, copying down the class stops with one missing value, order may have two missing values, etc...
I guess the challenge is that I need the value of Dataframe[[ j ]][ i-n ] in whatever function I would pass to apply. When apply passes 'x' into the function, does it pass an object with attributes (like index/row name) or simply the value??  
Or is this a wasted line of thought, do it with for loops and use rcpp if I really need speed. This is done annually dataframe has ~8,000 rows and 13 columns we'd operate over. I don't think performance would be an issue... but we haven't tried yet. Not sure why.
 
     
     
    