#!perl =head1 NAME reshtml - extract image (and other resource) urls from a HTML =head1 SYNOPSIS B [B<-b> I] [I...] =head1 DESCRIPTION Parses a HTML document, extract the URL of all images and other resources in it, and print them one per line. Currently url for images, default style sheets, and favicons are collected. Scripts, optional style sheets, random hyperlinks and random header links, applets, netscape low-res image previews, refresh targets, frames and iframes are ignored, though some of this could change in the future. The HTML documents are read from the files whose names are given on command line, or from STDIN if no name is given. Repeated URLs are printed only once, though no effort is done to recognize equivalent URLs. =head1 OPTIONS =over =item B<-b> I Qualify relative urls using I as the base. Note that a base url given in the HTML document (with the B tag) is always used this way, no matter whether you give this switch or not. If no base URL is known, but relative URLs are found, they are output as is but with a warning. Use B<-b .> to silence this warning. =item B<-i> I Read I for a list of URLs and download filenames. The filenames from second column give the name of HTML files to read and parse, the URLs in the first column are used as the base URL only. This option excludes giving filenames or base urls from the command line. The listfile has the same format as the listfile for L, making it easier to process HTML files you have downloaded with that utility. =item B<-P> I Interpret filenames as relative to directory I. This is most useful with B<-i>, but can be used otherwise too. =item B<-v> Print names of files as they're parsed. =back =cut