#!perl

=head1 NAME

reshtml - extract image (and other resource) urls from a HTML

=head1 SYNOPSIS

B<reshtml > [B<-b> I<BASEURL>] [I<FILENAME>...]

=head1 DESCRIPTION

Parses a HTML document, extract the URL of all images and other resources in
it, and print them one per line.  

Currently url for images, default style sheets, and favicons are collected. 
Scripts, optional style sheets, random hyperlinks and random header links,
applets, netscape low-res image previews, refresh targets, frames and iframes
are ignored, though some of this could change in the future.

The HTML documents are read from the files whose names are given on command
line, or from STDIN if no name is given.

Repeated URLs are printed only once, though no effort is done to recognize
equivalent URLs.

=head1 OPTIONS

=over

=item B<-b> I<BASEURL>

Qualify relative urls using I<BASEURL> as the base.  

Note that a base url given in the HTML document (with the B<base> tag)
is always used this way, no matter whether you give this switch or not.
If no base URL is known, but relative URLs are found, they are output
as is but with a warning.  Use B<-b .> to silence this warning.

=item B<-i> I<LISTFILE>

Read I<LISTFILE> for a list of URLs and download filenames.  The filenames
from second column give the name of HTML files to read and parse, the
URLs in the first column are used as the base URL only.  This option
excludes giving filenames or base urls from the command line.

The listfile has the same format as the listfile for L<wgetas(1)>, making
it easier to process HTML files you have downloaded with that utility.

=item B<-P> I<PREFIXDIR>

Interpret filenames as relative to directory I<PREFIXDIR>.  This is most
useful with B<-i>, but can be used otherwise too.

=item B<-v>

Print names of files as they're parsed.

=back

=cut