I have a csv file consisting of around 200.000 rows of transactions. Here is the import and little preprocessing of the data:
data <- read.csv("bitfinex_data/trades.csv", header=T)
data$date <- as.character(data$date)
data$date <- substr(data$date, 1, 10)
data$date <- as.numeric(data$date)
data$date <- as.POSIXct(data$date, origin="1970-01-01", tz = "GMT")
head(data)
id          exchange  symbol                date price     amount  sell
1 24892563       bf   btcusd 2018-01-02 00:00:00 13375 0.05743154 False
2 24892564       bf   btcusd 2018-01-02 00:00:01 13374 0.12226129 False
3 24892565       bf   btcusd 2018-01-02 00:00:02 13373 0.00489140 False
4 24892566       bf   btcusd 2018-01-02 00:00:02 13373 0.07510860 False
5 24892567       bf   btcusd 2018-01-02 00:00:02 13373 0.11606086 False
6 24892568       bf   btcusd 2018-01-02 00:00:03 13373 0.47000000 False
My goal is to obtain hourly sums of amount of token being traded. For this I need to split my data based on hours, which I did in a following way:
tmp <- split(data, cut(data$date,"hour"))
However this is taking way too long (up to 1 hour) and I wonder whether or not this is normal behaviour for functions such as split() and cut()? Is there any alternative to using those two functions?
UPDATE:
After using great suggestion by @Maurits Evers, my output table looks like this:
# A tibble: 25 x 2
   date_hour     amount.sum
   <chr>              <dbl>
 1 1970-01-01 00       48.2
 2 2018-01-02 00     2746. 
 3 2018-01-02 01     1552. 
 4 2018-01-02 02     2010. 
 5 2018-01-02 03     2171. 
 6 2018-01-02 04     3640. 
 7 2018-01-02 05     1399. 
 8 2018-01-02 06      836. 
 9 2018-01-02 07      856. 
10 2018-01-02 08      819. 
# ... with 15 more rows
This is exactly what I wanted, expect for the first row, where the date is from year 1970. Any suggestion on what might be causing the problem? I tried to change the origin parameter of as.POSIXct() function but that did not solve the problem.
 
     
     
    