I remember in a later on tutorial he writes part of it in asm randomly because it's "faster" but it's the same thing a compiler would output, or worse.
Of course, now everyone who does SIMD writes it with compiler intrinsics or some code wrapper that generates terrible asm which really is slower, but they think it's "cleaner". (Which on Intel it isn't, because their intrinsics are even less readable than their asm.)
https://nehe.gamedev.net/tutorial/playing_avi_files_in_openg...
Of course, now everyone who does SIMD writes it with compiler intrinsics or some code wrapper that generates terrible asm which really is slower, but they think it's "cleaner". (Which on Intel it isn't, because their intrinsics are even less readable than their asm.)