As you say, you are generating a system call (plus memory copy from user to kernel space) for every iteration: that might be a small penalty, but incurs 130,000 times... doing it just once saves some bit work, hence the (small) performance increase.
OTOH, if your strings get extremely big you are probably going to lose this benefit, since buffering in your sub is likely to cause a lot of work for memory allocation.