Here's some example code that parses the BeOS Central headlines page, and returns an array of hash refs:
#!/usr/bin/perl -w use strict; use LWP::Simple qw(get); use Data::Dumper qw(Dumper); #Define the columns to parse out my @columns = qw( headline year month day hour minute second url description ); #Generate a regex to fetch the column data my $regex = join '\n', ( '([^\n]+)', '(\d{4})-(\d{2})-(\d{2})\s(\d{2}):(\d{2}):(\d{2})', '([^\n]+)', '(.*?)\s+', #match everything, except the last bit of whitespace ); #get the web page my $text = get('http://www.beoscentral.com/headlines.php'); my @rows; foreach my $record (split "%%\n", $text) { my %row; @row{@columns} = ($record =~ /^$regex$/so) or next; push @rows, \%row; } print Dumper(\@rows); __END__
I am sure the regex could be done in a faster/better/elegant way, but the answer eludes me at this time.
Update: Removed the /g modifier from the regex. It was a useless addition to the regex in this case.
In reply to (dkubb) Re: (2) Parsing News from a Site Backend
by dkubb
in thread Parsing News from a Site Backend
by Segfault
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |