The video overlays in question are not drawn by blending into a framebuffer in m...

The video overlays in question are not drawn by blending into a framebuffer in memory. They're two separate display planes that are read in parallel by the display path, scaled, and blended together at scan-out time. There are only reads, no writes. Modern GPUs support alpha-blended display planes using the alpha channel that is otherwise often required to exist anyway as padding.

As OP noted, using hardware display planes can have efficiency advantages for cases like floating controls over a video or smoothly animating a plane over a static background, since it avoids an extra read+write for the composited image. However, it also has some quirks -- like hardware bandwidth limits on how small a display plane can be scaled.