comment on

Yes, this is quite a different sort of problem from the initial example that started this thread. If you have a small number of distinct sites that you're scanning, and you are reasonably confident that each site has its own pattern that it follows consistently, then you can try keeping the appropriate regexes for price extraction in its own hash, keyed by web-site name -- something like:

my %regs = ( "site1.com" => [ "price", qr{prices: <table>(.+?)</table>
+}is ],
             "site2.com" => [ "cas", qr{cas: <b>(\d+-\d{2}-\d+)}is ],
          ...
            );
...
foreach my $site (keys %regs) {
    ...  # fetch data into $pagecontent...
    my ($key,$reg) = @$regs{$site};
    $thisHash{$key} = $1 if $pagecontent =~ $reg;
}
[download]

This at least makes it easier to keep track of each site's peculiarities, and to limit the number of executable lines you need to actually work through all the sites. (Maybe you need a slightly more elaborate structure, if you're pulling "price" and "CAS" from the same site; maybe you can see the way to go given this example.)

I can't imagine doing this any more compactly, since it does depend heavily on specific knowledge about how each site formats is price lists, etc. It would be hard to generalize any further unless all the sites somehow managed to do roughly the same thing to present their information, which seems implausible. (It's not always a given that you can use regexes for this sort of thing at all -- many folks here would suggest that you use HTML::TokeParser or somesuch, which might not be a bad idea... Do look at least at HTML::TokeParser::Simple; it may make things a lot easier and give you a level of "abstraction" (generality) that will be useful.)

BTW, I noticed that your example in this reply was referring to "$1", though your regexes did not contain any parens. That would be wrong.

In reply to Re: Re: Re: Oft encountered regex problem by graff
in thread Oft encountered regex problem by GermanHerman

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.