Minor note: this is one of the reasons I really dislike month-first date representations. If you can stomach having year/month, year-month, or something similarly ordered, this would not be necessary ... but I digress.
The way to solve it has nothing to do with ggplot2, though it will benefit from this fix. Since you're already using factor, it's even easier. When you define the factors, you implicitly define the order.
Two methods:
Using the data provided, with no extra levels.
set.seed(2)
random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
month_of_date <- lubridate::month(random_dates)
year_of_date <- lubridate::year(random_dates)
month_year_of_date <- paste(month_of_date, year_of_date, sep = "/")
month_year_of_date
# [1] "11/2018" "4/2020" "11/2019" "10/2018" "11/2020" "11/2020" "9/2018"
# [8] "8/2020" "8/2019" "10/2019" "10/2019" "12/2018" "5/2020" "10/2018"
# [15] "6/2019" "8/2020" "12/2020" "12/2018" "7/2019" "7/2018"
the are out of order, so we use order, by the year and month variables:
ordered_month_year_of_date <- unique(month_year_of_date[ order(year_of_date, month_of_date) ])
ordered_month_year_of_date
# [1] "7/2018" "9/2018" "10/2018" "11/2018" "12/2018" "6/2019" "7/2019"
# [8] "8/2019" "10/2019" "11/2019" "4/2020" "5/2020" "8/2020" "11/2020"
# [15] "12/2020"
now define the factor
month_year_of_date <- factor(month_year_of_date, levels = ordered_month_year_of_date)
Define a full-length set of possible months; this will be bigger, but if you expect to expand the dataset at some point, then all points in between will already be covered.
set.seed(2)
random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
month_of_date <- lubridate::month(random_dates)
year_of_date <- lubridate::year(random_dates)
ordered_date_range <- format(do.call(seq, c(as.list(range(random_dates)), by="month")),
format = "%m/%Y")
head(ordered_date_range)
# [1] "07/2018" "08/2018" "09/2018" "10/2018" "11/2018" "12/2018"
the leading-zero will flumox factor, so we'll remove it:
ordered_date_range <- gsub("^0", "", ordered_date_range)
head(ordered_date_range)
# [1] "7/2018" "8/2018" "9/2018" "10/2018" "11/2018" "12/2018"
month_year_of_date <- factor(paste(month_of_date, year_of_date, sep = "/"),
levels = ordered_date_range)
From here, sorting "just works":
month_year_of_date
# [1] 11/2018 4/2020 11/2019 10/2018 11/2020 11/2020 9/2018 8/2020 8/2019
# [10] 10/2019 10/2019 12/2018 5/2020 10/2018 6/2019 8/2020 12/2020 12/2018
# [19] 7/2019 7/2018
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020
sort(month_year_of_date)
# [1] 7/2018 9/2018 10/2018 10/2018 11/2018 12/2018 12/2018 6/2019 7/2019
# [10] 8/2019 10/2019 10/2019 11/2019 4/2020 5/2020 8/2020 8/2020 11/2020
# [19] 11/2020 12/2020
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020
which will make your (completely untested) plotting code something like:
ggplot(housing_data, aes(x = month_year_of_date, y = housing_data$price)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
geom_line()
(i.e., no factor, since it's already been done).