After reading questions such as this SO question on documenting a data set with Roxygen I have managed to document a dataset (which I will refer to as cells) and it now appears in the list generated by data(package="mypackage") and is loaded if I run the command data(cells). After this, cells will appear when ls() is run.
However, in many packages the data is immediately available without requiring a data() call. Also, the data names do not appear when ls() is run. An example is the baseball data set that comes with plyr. I have looked at the source for plyr and I cannot see how this is done.
 
     
    