in reply to Optimizing a regex

If you know the stuff is all in the top few lines and you _know_ that the top few lines are, less than (say) 16k, then Malkavian's suggestion will help most:
read(DATAFILE, $_, 16384); ($pt) = /pagetitle.*?"(.*?)"/mi; ($pc) = /category.*?"(.*?)"/im;
This gets you the first of each, if there is one, with no loop at all. (Your code tests each var on each loop to make sure it's not the second entry for that item. Do you need to do that?)

p