How much memory is being shared between your child processes? I.e. what does 'top' report in the 'SHARED' column?
If they are not sharing a lot of memory, do you have use LWP; and use WWW::Mechanize; 'outside' your while loop (like at the beginning of your program)? This will enable them to share those modules.
i'm on os-x and hardly know how to use top on this craptop. I checked my memory usage through activity monitor. The root says 29mb and everything else (upto 20 forks or so) say 6.5-7.5 mb real memory + 73mb virtual. So should i not bother with threads and simply optimize this forked code?