In attempting to adapt a text vectorization layer to a UTF-8 encoded vocabulary:
encoder = tf.keras.layers.TextVectorization(
    max_tokens=VOCAB_SIZE,
    standardize="lower",
)
encoder.adapt(train.map(
    lambda doc, label : doc
))
The following error occurs:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in position 49: ordinal not in range(128)
The input is encoded as UTF-8. According to my understanding, this should be compatible with Tensorflow.
Environment:
Tensorflow version: 2.10.0
Python version: 3.10.8
Platform: Linux
Full traceback:
InvalidArgumentError                      Traceback (most recent call last)
Cell In [10], line 8
      1 VOCAB_SIZE = 5000
      3 encoder = tf.keras.layers.TextVectorization(
      4     max_tokens=VOCAB_SIZE,
      5     standardize="lower",
      6 )
----> 8 encoder.adapt(train.map(
      9     lambda doc, label : doc
     10 ))
File ~/.local/lib/python3.10/site-packages/keras/layers/preprocessing/text_vectorization.py:467, in TextVectorization.adapt(self, data, batch_size, steps)
    417 def adapt(self, data, batch_size=None, steps=None):
    418     """Computes a vocabulary of string terms from tokens in a dataset.
    419 
    420     Calling `adapt()` on a `TextVectorization` layer is an alternative to
   (...)
    465           argument is not supported with array inputs.
    466     """
--> 467     super().adapt(data, batch_size=batch_size, steps=steps)
File ~/.local/lib/python3.10/site-packages/keras/engine/base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps)
    256 with data_handler.catch_stop_iteration():
    257     for _ in data_handler.steps():
--> 258         self._adapt_function(iterator)
    259         if data_handler.should_sync:
    260             context.async_wait()
File ~/.local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb
File ~/.local/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52 try:
     53   ctx.ensure_initialized()
---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                       inputs, attrs, num_outputs)
     56 except core._NotOkStatusException as e:
     57   if name is not None:
InvalidArgumentError: Graph execution error:
2 root error(s) found.
  (0) INVALID_ARGUMENT:  UnicodeEncodeError: 'ascii' codec can't encode character '\xae' in position 401: ordinal not in range(128)
Traceback (most recent call last):
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 279, in __call__
    return [self._convert(x) for x in ret]
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 279, in <listcomp>
    return [self._convert(x) for x in ret]
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 237, in _convert
    return result.astype(np.bytes_)
UnicodeEncodeError: 'ascii' codec can't encode character '\xae' in position 401: ordinal not in range(128)
     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
     [[UniqueWithCounts/_6]]
  (1) INVALID_ARGUMENT:  UnicodeEncodeError: 'ascii' codec can't encode character '\xae' in position 401: ordinal not in range(128)
Traceback (most recent call last):
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 279, in __call__
    return [self._convert(x) for x in ret]
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 279, in <listcomp>
    return [self._convert(x) for x in ret]
  File "/home/moss/.local/lib/python3.10/site-packages/tensorflow/python/ops/script_ops.py", line 237, in _convert
    return result.astype(np.bytes_)
UnicodeEncodeError: 'ascii' codec can't encode character '\xae' in position 401: ordinal not in range(128)
     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_adapt_step_195]
