Using the latest version of Math::Int128, and your modified script, I find an improvement (on Windows Vista) of around 40%. Given that we're using different operating systems and probably different processors, I think we can agree that "I find the same as you".
I should add that even 40% is better than I could get with my approach to Math::Int128 modifications. I might learn something if I ever find the time and energy to discover why that was so. (Best I could get was to have the int128 arithemtic about 5-10% faster than Math::GMPz.)