XS has a surprising amount of overhead per call copying parameters in and out. When I was writing Lingua::Stem I was able to achieve 80% of the speed of Lingua::Stem::Snowball (which uses XS) for stemming in place and actually beatLingua::Stem::Snowball by 30% for large straight list processing.