This at least makes it easier to keep track of each site's peculiarities, and to limit the number of executable lines you need to actually work through all the sites. (Maybe you need a slightly more elaborate structure, if you're pulling "price" and "CAS" from the same site; maybe you can see the way to go given this example.)my %regs = ( "site1.com" => [ "price", qr{prices: <table>(.+?)</table> +}is ], "site2.com" => [ "cas", qr{cas: <b>(\d+-\d{2}-\d+)}is ], ... ); ... foreach my $site (keys %regs) { ... # fetch data into $pagecontent... my ($key,$reg) = @$regs{$site}; $thisHash{$key} = $1 if $pagecontent =~ $reg; }
I can't imagine doing this any more compactly, since it does depend heavily on specific knowledge about how each site formats is price lists, etc. It would be hard to generalize any further unless all the sites somehow managed to do roughly the same thing to present their information, which seems implausible. (It's not always a given that you can use regexes for this sort of thing at all -- many folks here would suggest that you use HTML::TokeParser or somesuch, which might not be a bad idea... Do look at least at HTML::TokeParser::Simple; it may make things a lot easier and give you a level of "abstraction" (generality) that will be useful.)
BTW, I noticed that your example in this reply was referring to "$1", though your regexes did not contain any parens. That would be wrong.
In reply to Re: Re: Re: Oft encountered regex problem
by graff
in thread Oft encountered regex problem
by GermanHerman
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |