There are two subtle considerations that impact the answer to this question, and these are mentioned by Josh O'Brien and Valentin in comments. The first is that subsetting via .SD is very inefficient, and it is better to sample .I directly (see the benchmark below).
The second consideration, if we do sample from .I, is that calling sample(.I, size = 1) leads to unexpected behavior when .I > 1 and length(.I) = 1. In this case, sample() behaves as if we called sample(1:.I, size = 1), which is surely not what we want. As Valentin notes, it's better to use the construct .I[sample(.N, size = 1)] in this case.
As a benchmark, we build a simple 1,000 x 1 data.table and sample randomly per group. Even with such a small data.table the .I method is roughly 20x faster.
library(microbenchmark)
library(data.table)
set.seed(1L)
DT <- data.table(id = sample(1e3, 1e3, replace = TRUE))
microbenchmark(
  `.I` = DT[DT[, .I[sample(.N, 1)], by = id][[2]]],
  `.SD` = DT[, .SD[sample(.N, 1)], by = id]
)
#> Unit: milliseconds
#>  expr       min        lq     mean    median        uq       max neval
#>    .I  2.396166  2.588275  3.22504  2.794152  3.118135  19.73236   100
#>   .SD 55.798177 59.152000 63.72131 61.213650 64.205399 102.26781   100
Created on 2020-12-02 by the reprex package (v0.3.0)