I've been looking into implementations of atomic reference counting.
Most of the operations are very consistent between libraries, but I've found a surprising variety in the "decrease refcount" operation. (Note that, generally, the only difference between shared and weak decref is which on_zero() is called. Exceptions are noted below.)
If there are other implementations implemented in terms of C11/C++11 model (what does MSVC do?), other than the "we use seq_cst because we don't know any better" kind, feel free to edit them in.
Most of the examples were originally C++, but here I've rewritten them to C, inlined and normalized to the >= 1 convention:
#include <stdatomic.h>
#include <stddef.h>
typedef struct RefPtr RefPtr;
struct RefPtr {
_Atomic(size_t) refcount;
};
// calls the destructor and/or calls free
// on a shared_ptr, this also calls decref on the implicit weak_ptr
void on_zero(RefPtr *);
From Boost intrusive_ptr examples and openssl:
void decref_boost_intrusive_docs(RefPtr *p) {
if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_release) == 1) {
atomic_thread_fence(memory_order_acquire);
on_zero(p);
}
}
It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.
But most others ( Boost, libstdc++, libc++ shared ) do something else:
void decref_common(RefPtr *p) {
if (atomic_fetch_sub_explicit(&p->refcount, 1, memory_order_acq_rel) == 1)
on_zero(p);
}
But libc++ does something different for the weak count. Curiously, this is in an external source file:
void decref_libcxx_weak(RefPtr *p) {
if (atomic_load_explicit(&p->refcount, memory_order_acquire) == 1)
on_zero(p);
else
decref_common(p);
}
The question, then is: what are the actual differences?
Sub-questions: Are the comments wrong? What do specific platforms do (on aarch64, would ldar be cheaper than dmb ishld? also ia64?)? Under what conditions can weaker versions be used (e.g. if the dtor is a nop, if the deleter is just free, ...)?
See also Atomic Reference Counting and Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?