It may also be worth looking at the algorithm used in tye's Algorithm::Loops, which is jolly clever. I haven't compared it directly with this one, though if I had to guess, I would guess this one is probably still faster.
PS. I still haven't figured out how you can get away with doing nothing when $copy->[-1] == @$copy. What's the trick?
Update: Okay, I've got it. That's pretty clever. Here's my proof that it works.
Lemma: for every $n ≥ 1, there is some permutation of (1..$n+1) that takes more flips than any permutation of (1..$n).
Proof. Let @a be a permutation of (1..$n) that takes as many flips as possible, say it takes $flip flips; and consider the permutation ($n+1, reverse @a). Clearly this takes ($flip+1) flips.
Now, if $copy->[-1] == @$copy then the last element can never be moved by the flipping, and so this will take the same number of flips as @$copy[0..$#$copy-1]. By the Lemma, this is less than the maximum number of flips.