Using FFmpeg version 7.1 with [0:v][uploaded_logo]overlay_cuda there is an error Can't overlay yuva420p on nv12 (but no white block).
We may fix it by using [0:v]scale_cuda=format=yuv420p for converting from NV12 to yuv420p:
ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda -c:v h264_cuvid -i ./input.avi -i ./watermark.png -filter_complex "[0:v]scale_cuda=format=yuv420p[yuv420p_video];[1]format=rgba,colorchannelmixer=aa=0.3,scale=300:300,hwupload_cuda[uploaded_logo];[yuv420p_video][uploaded_logo]overlay_cuda=W-w-10:H-h+10" -c:v h264_nvenc ./output.mp4
-c:v h264_cuvid - select the decoder specifically in relation to
Gyan's comment to the following post. (the solution is working without it).
[1]format=rgba,colorchannelmixer=aa=0.3,scale=300:300,hwupload_cuda[uploaded_logo] - Prepare the logo in the CPU, and upload it to the GPU. The uploaded logo format is automatically converted from rgba to yuva420p.
[0:v]scale_cuda=format=yuv420p[yuv420p_video] - Convert the format of the decoded video in the GPU from the default NV12 format to yuv420p format. It is required because overlay_cuda filter can't overlay yuva420p format over NV12 format.
[yuv420p_video][uploaded_logo]overlay_cuda=W-w-10:H-h+10 - use overlay_cuda filter to overlay the logo over the video in the GPU.
Testing:
Build a sample input file:
ffmpeg -y -f lavfi -i testsrc=size=640x480:rate=10:duration=10 -c:v libx264 -pix_fmt yuv420p input.avi
Use the following watermark.
Executing the above command (using aa=0.9) gives the desired output:

Note:
Overlaying yuva420p over yuv420p is not as accurate as overlaying rgba over rgb24 due to color subsampling of the 4:2:0 formats.
It looks like GPU implementations are limited to "sub-sampled" pixel formats (but I didn't try all the options).