Suppose I want to write a custom optimizer class that conforms to the tf.keras API (using TensorFlow version>=2.0). I am confused about the documented way to do this versus what's done in implementations.
The documentation for tf.keras.optimizers.Optimizer states,
  ### Write a customized optimizer.
  If you intend to create your own optimization algorithm, simply inherit from
  this class and override the following methods:
    - resource_apply_dense (update variable given gradient tensor is dense)
    - resource_apply_sparse (update variable given gradient tensor is sparse)
    - create_slots (if your optimizer algorithm requires additional variables)
However, the current tf.keras.optimizers.Optimizer implementation does not define a resource_apply_dense method, but it does define a private-looking _resource_apply_dense method stub. Similarly, there are no resource_apply_sparse or create_slots methods, but there are a _resource_apply_sparse method stub and a _create_slots method call.
In official tf.keras.optimizers.Optimizer subclasses (using tf.keras.optimizers.Adam as an example), there are _resource_apply_dense, _resource_apply_sparse, and _create_slots methods, and there are no such methods without the leading underscore.
There are similar leading-underscore methods in slightly-less-official tf.keras.optimizers.Optimizer subclasses (e.g., tfa.optimizers.MovingAverage from TensorFlow Addons: _resource_apply_dense, _resource_apply_sparse, _create_slots).
Another confounding point for me is that some of the TensorFlow Addons optimizers also override the apply_gradients method (e.g., tfa.optimizers.MovingAverage), whereas the tf.keras.optimizers optimizers do not.
Moreover, I noticed that the apply_gradients method of tf.keras.optimizers.Optimizer method calls _create_slots, but the base tf.keras.optimizers.Optimizer class does not have a _create_slots method.
So, it seems that a _create_slots method must be defined in an optimizer subclass if that subclass does not override apply_gradients.
Questions
What is the correct way to subclass a tf.keras.optimizers.Optimizer? Specifically,
- Does the tf.keras.optimizers.Optimizerdocumentation listed at the top simply mean to override the leading-underscore versions of the methods they mention (e.g.,_resource_apply_denseinstead ofresource_apply_dense)? If so, are there any API guarantees about these private-looking methods not changing their behavior in future versions of TensorFlow? What are the signatures of these methods?
- When would one override apply_gradientsin addition to the_apply_resource_[dense|sparse]methods?
Edit. Opened issue on GitHub: #36449
 
     
    