I understand that localCheckpoint remove the history necessary to rebuild the RDD. And cache is saving the current state of the RDD so it does not need to rebuild it. 
However, I am confused on a few aspects. If I do localCheckpoint , and I need this RDD later in my code, I often get an Exception about how the partition is not found anymore. 
I looked at the Storage tab in the sparkUI and it says that only a small fraction of the RDD was saved, like 17%. 
So I read more and realize that spark will discard old RDDs. Is there a way for Spark to keep it forever ?
Also, if I was doing cache instead of localCheckpoint, would the problem be solved ? But it will just take time as Spark will have to recompute the partition ? 
Overall, I just want to keep a RDD in memory for a big part of my job to be able to merge it back at the very end but by the time I get there, Spark have removed it. How do I solve that ?
Does doing localCheckpoint.cache or cache.localCheckpoint do anything ? Or one or the other is enough ?