I am looking to extract patterns of URL from given sites.
Example: http://www.perlmonks.org/index.pl?node_id=629153 is a valid question-answer node.
Where as http://www.perlmonks.org/index.pl?node=Recently%20Active%20Threads is not. There is a certain pattern follows here that node_id=\d+ is a valid question-answer node. Extracting these type of patterns from given site, can help me to determine the nature of the link. I like to do these site-wide, automatically.