"second-stage bootloader" … "loaded drivers" … "GPU" …
You're thinking that booting DOS+Windows was a (comparatively) simple affair, akin to how operating systems like Windows NT, FreeBSD, and Linux distributions boot. It was far from simple.
The animation is an old and simple personal computing trick: palette rotation. There's no executable running. The logo is a static bitmap, that's loaded into video RAM, and an interrupt hook simply cycles part of the palette to make the bitmap "animate". There's no GPU, either. This is 320×200 VGA graphics with 256 colours.
As for what the system is doing and whether the kernel is loaded, the answers are "a heck of a lot of different stuff" and "that depends from which of the two kernels you are talking about".
Basically, the logo was loaded after the DOS kernel (BDOS and BIOS, incorporating its built-in device drivers, all in a single file io.sys) was loaded. Whilst the animation was on-screen, all of the rest of the boot process was going on, including amongst other things the loading of the Windows kernel (and its device drivers, and a Virtual Machine Manager, and various DOS housekeeping utilities …). And there was a complicated mechanism under the covers to ensure both that the operation of the command interpreter and DOS housekeeping utilities didn't splat text all over the logo and that text mode was reinstated if it was actually needed.
Those who see here a resemblance to Plymouth, the splash screen system for several Linux distributions, and wonder at the "comparatively simple" that I wrote above should note that whilst the goals are the same, the mechanisms are different. Plymouth runs as a fairly ordinary application-mode program on a multitasking operating system. Whereas the DOS+Windows 9x/ME splash screen involved hooks into firmware keyboard and video APIs, direct manipulation of the VGA register file, the joy of VGA's banked video modes, and the nastiness required for doing "background stuff" on single-tasking MS-DOS.
Further reading