I'm interested in finding or implementing a Rust data structure that provides a zero-cost way to memoize a single computation with an arbitrary output type T. Specifically I would like a generic type Cache<T> whose internal data takes up no more space than an Option<T>, with the following basic API:
impl<T> Cache<T> {
/// Return a new Cache with no value stored in it yet.
pub fn new() -> Self {
// ...
}
/// If the cache has a value stored in it, return a reference to the
/// stored value. Otherwise, compute `f()`, store its output
/// in the cache, and then return a reference to the stored value.
pub fn call<F: FnOnce() -> T>(&self, f: F) -> &T {
// ...
}
}
The goal here is to be able to share multiple immutable references to a Cache within a single thread, with any holder of such a reference being able to access the value (triggering the computation if it is the first time). Since we're only requiring to be able to share a Cache within a single thread, it is not necessary for it to be Sync.
Here is a way to implement the API safely (or at least I think it's safe) using unsafe under the hood:
use std::cell::UnsafeCell;
pub struct Cache<T> {
value: UnsafeCell<Option<T>>
}
impl<T> Cache<T> {
pub fn new() -> Self {
Cache { value: UnsafeCell::new(None) }
}
pub fn call<F: FnOnce() -> T>(&self, f: F) -> &T {
let ptr = self.value.get();
unsafe {
if (*ptr).is_none() {
let t = f();
// Since `f` potentially could have invoked `call` on this
// same cache, to be safe we must check again that *ptr
// is still None, before setting the value.
if (*ptr).is_none() {
*ptr = Some(t);
}
}
(*ptr).as_ref().unwrap()
}
}
}
Is it possible to implement such a type in safe Rust (i.e., not writing our own unsafe code, only indirectly relying on unsafe code in the standard library)?
Clearly, the call method requires mutating self, which means that Cache must use some form of interior mutability. However, it does not seem possible to do with a Cell, because Cell provides no way to retrieve a reference to the enclosed value, as is required by the desired API of call above. And this is for a good reason, as it would be unsound for Cell to provide such a reference, because it would have no way to ensure that the referenced value is not mutated over the lifetime of the reference. On the other hand, for the Cache type, after call is called once, the API above does not provide any way for the stored value to be mutated again, so it is safe for it to hand out a reference with a lifetime that can be a long as that of the Cache itself.
If Cell can't work, I'm curious if the Rust standard library might provide some other safe building blocks for interior mutability which could be used to implement this Cache.
Neither RefCell nor Mutex fulfill the goals here:
- they are not zero-cost: they involve storing more data than that of an
Option<T>and add unnecessary runtime checks. - they do not seem to provide any way to return a real reference with the lifetime that we want -- instead we could only return a
ReforMutexGuardwhich is not the same thing.
Using only an Option would not provide the same functionality: if we share immutable references to the Cache, any holder of such a reference can invoke call and obtain the desired value (and mutating the Cache in the process so that future calls will retrieve the same value); whereas, sharing immutable references to an Option, it would not be possible to mutate the Option, so it could not work.