My first guess was that read.xlsx() is trying to guess the date-looking columns in .xlsx while it reads the file, and weirdly converts the time from %I:%M %p format into fractions of 24 hours (because e.g. 0.3333333 * 24 = 7.999999 which is exactly 8.0). But latter I noticed that if I change the parameter detectDates into FALSE nothing really changes - it outputs the same data frame. So it guess nothing, it just reads the TIME as it is.
If you try to edit 10:00 PM within Excel workbook, you'll see that it is really stored as 22:00:00. So why at the end it is represented as a fraction of 24?! I don't know, and I hope someone can explain that.
@Randall approach is really good alternative comparing to openxlsx::read.xlsx(). Note that read_xlsx() recognizes TIME as %H:%M:%S, and converts it into the dummy POSIXct/POSIXt object, i.e. 1899-12-31 08:00:00 and 1899-12-31 22:00:00.
Surprisingly, read_xlsx() doesn't recognize that DATE has %d-%m-%Y format, and interpret it as a character. Meaning that we need to convert both variables into appropriate format in order to obtain desired output.
I don't think we need to use gsub to get the 12-hour clock time from POSIXct object, it is much easire to use format for this purpose. And conversion of DATE from %d-%m-%Y into %Y-%m-%d format is even an easier task:
library(dplyr)
library(readxl)
read_xlsx("myfile.xlsx") |>
mutate(
DATE = as.Date(DATE, "%d/%m/%Y"),
TIME = format(TIME, "%I:%M %p") # “That’s what I do: I drink and I know things.”
)
Which produces:
# A tibble: 2 x 2
DATE TIME
<date> <chr>
1 2015-02-15 08:00 AM
2 2014-01-22 10:00 PM