When encoding with H.264, the sound shifts about 12 ms.
Regardless of resolution or frame rate, the sound will be shifted by about 12 ms.
Even if you export from After Effects or convert mp4 directly, about 12 ms of sound will be misaligned.
Antoine (Autokroma.com) commented
This might be because of the audio decoding ! To work around this you can encode in .mov with WAVE with AfterCodecs https://autokroma.com/AfterCodecs/