I don't think you'll be able to come up with a general way to do this. You'll have to repeat the work you did for www.ntu.edu.sg/sce/staffacad.asp for each site, creating a sub for each site that extracts and returns all the information.
If your problem with extracting the publication text is stripping out the HTML from the text, look at HTML::Parser.
Best of luck to you.