oaklander has asked for the wisdom of the Perl Monks concerning the following question:
but will pick it up without the embedded html...<A HREF="path/name"><FONT SIZE=-1>path name</FONT></A>
Here is the Perl script:<A HREF="path/name">path name</A>
$/ = ""; $raw = ""; $linktext = ""; %atts = (); while (<>) { while (/<A\s([^>]+)>([^<]+)<\/A>/ig) { $raw = $1; $linktext = $2; $linktext =~ s/[\s]*\n/ /g; while ($raw =~ /([^\s=]+)\s*=\s*("([^"]+)"|[^\s]+\s*)/ig) { if (defined $3) { $atts{ uc($1) } = $3; } else { $atts{ uc($1) } = $2; } print '-' x 15; print "\nLink text: $linktext\n"; foreach $key ("HREF", "NAME", "TITLE", "REL", "REV", "TARGET") { if (exists($atts{$key})) { $atts{$key} =~ s/[\s]*\n/ /g; print " $key: $atts{$key}\n"; } } %atts = (); } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting information
by quent (Beadle) on Jan 09, 2002 at 20:12 UTC | |
by oaklander (Acolyte) on Jan 09, 2002 at 23:00 UTC | |
|
Re: HTML Parsing
by BazB (Priest) on Jan 09, 2002 at 20:20 UTC | |
|
Re: Extracting information
by fuzzysteve (Beadle) on Jan 09, 2002 at 20:03 UTC |