in reply to Key/Value pair from GET

If that print isn't showing up, the problem is that likely your regex is failing. Your regex could be failing because the data isn't quite in the format you expect. You would be better of being more liberal with your expression, which right now is limited to "word" characters by virtue of \w.

Your second print is likely not showing up because there are no "pairs", since none were added in the first loop.

A quick fix looks something like this:
#!/usr/bin/perl -w use strict; use LWP::Simple; my %categories; my $page = get('http://www.inshift.com/Products.html'); while ($page=~m/href="([^"]*)"\s+class="rightnav">([^>]*)</sg) { $categories{$1} = $2; print "[$1] and [$2]\n"; # Note "\n" } foreach my $idkey (keys %categories) { print "$idkey,$categories{$idkey}\n"; }
A few notes on the changes: A minor error, really.

Replies are listed 'Best First'.
Still empty...here is a live and working example
by inblosam (Monk) on Jun 06, 2002 at 07:44 UTC
    I added your changes (thanks for the tips...helps a lot) but I still get nothing from my print statements. Here is a working example of the page and text I am really trying to get:
    #!perl use LWP::Simple; use strict; use warnings; #get the values out my %categories = (); my $page = get 'http://www.handango.com/PlatformSoftware.jsp?platformI +d=1&siteId=1&zsortParams=true'; while ($page=~m/class="smallprint">([^"]*)"\s+siteId=([^>]*)</sg) { $categories{$1} = $2; print "[$1] and [$2]\n"; } foreach my $idkey (keys %categories) { print "$idkey,$categories{$idkey}\n"; }

    The URL seems to work just fine from a browser, so that doesn't seem to be the problem. Thanks for your help.

    Michael Jensen
    michael at inshift.com
    http://www.inshift.com
      Remember, 'A' and 'a' are about the same as '~' and 'â' unless otherwise specified. It's almost always a good idea to include /i in a regexp that can be subject to random influences, such as users. I noticed a few "SmallPrint" entries on your HTML.

      Before I get to that, let's just take this one step at a time. Here is, I believe, an example of the data you are trying to parse:
      <a href="PlatformSoftwareSection.jsp?siteId=1&jid=94DDB69B3747X42D738A +8A4E54CDD8A4&platfor mId=1&amp;special=&amp;bySection=1&amp;sectionId=2167&amp;catalog=1&am +p;title=FireViewer+Videos+%26+Images "> <span class="smallprint">E-Books & Document Readers</span></a>,
      Here's something that might do the job:
      #!/usr/bin/perl -w use strict; use LWP::Simple; my %categories; # No need for '= ()' my $page = get('http://www.handango.com/PlatformSoftware.jsp?platformI +d=1&siteId=1&zsortParams=true'); while ($page =~ / sectionId=(\d+) # Section ID (all digits) [^>"]+"> # Remainder of param and tag \s+ # Some whitespace <span\s+class="smallprint"> # SPAN tag ([^<]*) # "Stuff" up to next tag < # Start of next tag /xig) { $categories{$1} = $2; print "[$1] and [$2]\n"; } foreach my $idkey (keys %categories) { print "$idkey,$categories{$idkey}\n"; }
      You'll note I took the liberty of redefining your regex completely. In this case, I'm scooping the "sectionId" variable (numeric only) followed by any amount of "stuff", then grabbing the non-tagged content of the 'span' tag. It works, as best as I can tell, but isn't very adaptable.

      This lack of adaptability makes this program 'brittle' (translation: liable to break completely because of a small change in input) and only qualifies this for use as a Quick Hack. I'd hate to think that this would become a piece of code that would be used over six months from now. This is a very assumptive piece of code, and that's not good. You can assume things won't change in the HTML in the next few hours or days, or even weeks, but any time-frame longer than that is really going out on a limb.

      If this is a long term thing, I'd suggest doing it properly, perhaps by using HTML::Parser or HTML::LinkExtor and some more robust code that can handle slight changes in formatting better.
        Thanks everybody! The new regexp was the ticket. Works great now, and I will modify it to handle it if the page starts coming out different. I appreciated the tips on working with hashes and regexp. Thanks!

        Michael Jensen
        michael at inshift.com
        http://www.inshift.com