in reply to Re^2: getting LWP and HTML::TokeParser to run
in thread getting started with LWP and HTML::TokeParser
Update:
You can of course parse the HTML content of the search results with regex, but this is a mess...
Then things get hairy and you will want to whip out some of that HTML parser voo-doo to parse the resulting table. Also, the character codings aren't consistent, for example the page has ä, but not ü which is coded as ümy (@hrefs) = $mech->content =~ m|COMPLETEHREF=http://www.kultus-bw.de +/did_abfrage/detail.php\?id=\d+|g; print "$_\n" foreach @hrefs; #there are 5081 of these #these COMPLETEHREF's can be appended to a main url like this: my $example_url = 'http://www.kultusportal-bw.de/servlet/PB/menu/11884 +27/index.html?COMPLETEHREF=http://www.kultus-bw.de/did_abfrage/detail +.php?id=04146900';
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: getting LWP and HTML::TokeParser to run
by Perlbeginner1 (Scribe) on Oct 10, 2010 at 17:56 UTC | |
by marto (Cardinal) on Oct 10, 2010 at 18:16 UTC | |
by Marshall (Canon) on Oct 10, 2010 at 18:51 UTC | |
by Perlbeginner1 (Scribe) on Oct 10, 2010 at 19:33 UTC |