For the sake of completeness, this can be achieved as well using base R by
init[, .(id1 = rep(id, rev(seq_along(id) - 1L)),
         id2 = unlist(lapply(-seq_along(id), tail, x = id)))]
Benchmark
As the OP has mentioned
My real data is huge, so I'm looking for an efficient solution.
here is a benchmark comparing
- cj: CJ() with subsequent filtering
 
- sj: langtang's non-equi self-join
 
- rp: above base R approach using 
rep() and tail() 
for problem sizes (length of input vector) of 10, 100, 1000, and 10k rows. Note that for the 10k case the result contains nearly 50 million combinations. So available memory is a limiting factor.
Note that bench::mark() verifies that all approaches return the same result.
library(bench)
library(data.table)
bm <- press(
  n_id = 10^(4:1),
  {
    init0 <- data.table(id = sprintf("%09i", seq(n_id)))
    cat(sprintf("Number of combinations in result: %g\n", (n_id-1)*n_id/2))
    mark(
      rp = {
        init <- copy(init0)
        init[, .(id1 = rep(id, rev(seq_along(id) - 1L)),
                 id2 = unlist(lapply(-seq_along(id), tail, x = id)))]
      },
      sj = {
        init <- copy(init0)
        init[, id2 := as.factor(id)][
          init, on = .(id2 > id2), nomatch = 0, .(id1 = as.character(id2), id2 = id)]
      },
      cj = {
        init <- copy(init0)
        init[, CJ(id1 = id, id2 = id)][id1 < id2]
      },
      check = function(x, y) {
        tmp <- all.equal(x, y, check.attributes = FALSE)
          if (!isTRUE(tmp)) {
            cat(tmp, "\n")
            str(x)
            str(y)
          }
        tmp
      },
      min_iterations = 1L
    )
  }
)
bm
   expression  n_id      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result   memory  time 
   <bch:expr> <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>   <list>  <lis>
 1 rp         10000     1.8s     1.8s    0.556     2.05GB   0.556      1     1       1.8s <data.t… <Rprof… <ben…
 2 sj         10000    2.87s    2.87s    0.348     2.24GB   0.696      1     2      2.87s <data.t… <Rprof… <ben…
 3 cj         10000   49.29s   49.29s    0.0203    3.17GB   0.0203     1     1     49.29s <data.t… <Rprof… <ben…
 4 rp          1000  22.11ms  22.53ms   44.2      21.15MB   2.10      21     1   475.58ms <data.t… <Rprof… <ben…
 5 sj          1000  18.28ms  18.88ms   52.4      23.19MB   2.02      26     1   496.27ms <data.t… <Rprof… <ben…
 6 cj          1000 542.97ms 542.97ms    1.84     32.53MB   0          1     0   542.97ms <data.t… <Rprof… <ben…
 7 rp           100   1.46ms   1.52ms  651.      285.94KB   6.56     298     3   457.53ms <data.t… <Rprof… <ben…
 8 sj           100   2.14ms   2.21ms  450.      406.48KB   4.24     212     2   471.37ms <data.t… <Rprof… <ben…
 9 cj           100   6.92ms   6.97ms  143.      431.23KB   0         72     0   504.44ms <data.t… <Rprof… <ben…
10 rp            10  679.8µs 738.38µs 1343.       66.34KB   6.60     610     3   454.25ms <data.t… <Rprof… <ben…
11 sj            10   1.71ms   1.81ms  552.      156.17KB   4.22     262     2   474.47ms <data.t… <Rprof… <ben…
12 cj            10 915.48µs 967.99µs 1029.      100.83KB   4.21     489     2   475.13ms <data.t… <Rprof… <ben…
library(ggplot2)
autoplot(bm)

Conclusion
cj is magnitudes slower than the other 2 approaches for larger use cases while rp is on a par with sj but allocates slightly less memory.