i found the culprit operation which costs the most time:
if($num & $shift)
i'd say it's to do with having to add so many bits to $shift once $num starts to skyrocket. It should have been implemented in binary/packed format of same size bit patterns. btw: you can reduce the use of huge numbers to start with by generating that same bit pattern (1..24), repeating it, and just adding leftmost 1 bits as required. you can also precalc the "anded" values and leave out the & altogether. and with the other fixes to make it more perlish (the for loops), the prog should run in the order of a couple of minutes max. i got 6 minutes just by taking out the "if($num & shift)" and the "else"