I am sorry in advance for the long post which mixes several questions. If not appropriate, please edit or advise what I should do. I am practicing data.table join, here is an imaginary scenario:
"Two robots, each has 4 positions in MovementA and 4 positions in MovementB. Problem to Solve: for each robot, from MoveA to MoveB, there are 4x4 possible Position-pairs Find the 4 pairs with the shortest distance"
Data Setup
library(data.table)
set.seed(20141220)
dtMoveA = data.table(RobotID = rep(1:2, each=4), Position=sample(1:20, 8))
dtMoveB = data.table(RobotID = rep(1:2, each=4), Position=sample(1:20, 8))
# Review Data
rbind(cbind(Movement="Move-A", dtMoveA), cbind(Movement="Move-B", dtMoveB))
    Movement RobotID Position
 1:   Move-A       1       18
 2:   Move-A       1       20
 3:   Move-A       1       15
 4:   Move-A       1        8
 5:   Move-A       2       13
 6:   Move-A       2        2
 7:   Move-A       2        9
 8:   Move-A       2       12
 9:   Move-B       1       18
10:   Move-B       1       14
11:   Move-B       1       13
12:   Move-B       1       17
13:   Move-B       2        5
14:   Move-B       2       16
15:   Move-B       2       20
16:   Move-B       2        3
Solution 1 (using dplyr)
library(dplyr)
dtMoveA %>%
    inner_join(dtMoveB, by="RobotID") %>%
    mutate(AbsDistance = abs(Position.x - Position.y)) %>%
    group_by(RobotID, Position.x) %>%
    filter(AbsDistance == min(AbsDistance)) %>%
    arrange(RobotID, Position.x)
  RobotID Position.x Position.y AbsDistance
1       1          8         13           5
2       1         15         14           1
3       1         18         18           0
4       1         20         18           2
5       2          2          3           1
6       2          9          5           4
7       2         12         16           4
8       2         13         16           3
(attempt) Solution 2 (using data.table)
setkey(dtMoveA, RobotID)
setkey(dtMoveB, RobotID)
dtMoveA[dtMoveB, .(RobotID, Position, i.Position,
                         AbsDistance = abs(Position - i.Position)), allow.cartesian=TRUE
    ] [, MinDistance := min(AbsDistance), by=list(RobotID, Position)
    ] [ AbsDistance == MinDistance, .(Position, i.Position, AbsDistance), by=RobotID
    ] [ order(RobotID, Position)]
   RobotID Position i.Position AbsDistance
1:       1        8         13           5
2:       1       15         14           1
3:       1       18         18           0
4:       1       20         18           2
5:       2        2          3           1
6:       2        9          5           4
7:       2       12         16           4
8:       2       13         16           3
Question 1 Can you please correct my Solution2 with the good practices from data.table art?
Question 2 without the parameter allow.cartesian=TRUE data.table warns
"Join results in 32 rows; more than 8 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE"
Is it really a cartesian product? Here the join is made only on the common key values, it is just a coincidence in the data which yields a big join results.
Question 3 dtMoveA and dtMoveB have same column names. datatable join makes the distinction by changing the name to i.Position. Is the "i" prefix something hardcoded? And I suppose i.ColumnName always applies to Y member in the X[Y] join expression.
Thanks in advance for any help.
 
     
    