We created a MOOC course, where everything (clicking, attitudes, video viewing, etc.) was logged by a logging system. 100-150 students signed up to this course.
As a result of this research, we got a log file (json). With R i prepared this dataframe:
log_data <- ndjson::stream_in("log-export-20160721_1030.json")
dplyr::glimpse(log_data)
Observations: 1,443,817
 Variables: 22
 $ _id.$oid          <chr> "5707a89dcbbb4d92129ee44c", "5707a89...
 $ data              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ page              <chr> "http://elearning.szte.hu/mod/szte/f...
 $ pid               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,...
 $ time              <chr> "2016.04.08. 14:48:24.691", "2016.04...
 $ type              <chr> "load", "mousemove", "mousemove", "m...
 $ user              <chr> "3", "3", "3", "3", "3", "3", "3", "...
 $ data.realDistance <dbl> NA, 0.00000, 366.87055, 241.45600, N...
 $ data.x            <dbl> NA, 139, 176, 261, NA, 245, 1905, 21...
 $ data.xDistance    <dbl> NA, 0, 37, 85, NA, 16, NA, 111, NA, ...
 $ data.y            <dbl> NA, 29, 394, 620, NA, 761, 553, 451,...
 $ data.yDistance    <dbl> NA, 0, 365, 226, NA, 141, NA, 310, N...
 $ data.text         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.top          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.target       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.filename     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.length       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.actualTime   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.src          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.totalTime    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.videoId      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
 $ data.seekTime     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
My questions are:
How can I count the number of logs by users?
- Example: User 352 made 1000 log, but User 152 made 2 just log.
How can I group, split or separate the data table by user?
