One way to tidy things up a bit would be to stuff all of your if conditions into a hash, then do a single if check. First, I put the statement with all the states into a temporary script:

use warnings; use strict; use Data::Dumper; my $str = <DATA>; my %states = map {$_ => 1} $str =~ /'([A-Z]{2})'/g; print Dumper \%states; __DATA__ if($link->text ne 'AK' && $link->text ne 'KY' && $link->text ne 'AS' + && $link->text ne 'MA' && $link->text ne 'MI' && $link->text ne 'CO' + && $link->text ne 'DC' && $link->text ne 'GA' && $link->text ne 'IN' + && $link->text ne 'MD' && $link->text ne 'CT' && $link->text ne 'AR' + && $link->text ne 'ID' && $link->text ne 'IL' && $link->text ne 'CA' + && $link->text ne 'AL' && $link->text ne 'ME' && $link->text ne 'DE' + && $link->text ne 'GU' && $link->text ne 'FL' && $link->text ne 'IA' + && $link->text ne 'LA' && $link->text ne 'HI' && $link->text ne 'KS' + && $link->text ne 'AZ')

From the output, copy the following part and paste it back into the original script in the form of a hash:

'IN' => 1, 'FL' => 1, 'MD' => 1, 'MA' => 1, 'GU' => 1, 'DE' => 1, 'ID' => 1, 'KS' => 1, 'IA' => 1, 'LA' => 1, 'KY' => 1, 'ME' => 1, 'AR' => 1, 'HI' => 1, 'AK' => 1, 'GA' => 1, 'MI' => 1, 'AZ' => 1, 'CO' => 1, 'DC' => 1, 'AS' => 1, 'CA' => 1, 'IL' => 1, 'AL' => 1, 'CT' => 1

Original script:

... my %states = ( 'IN' => 1, 'FL' => 1, 'MD' => 1, 'MA' => 1, 'GU' => 1, 'DE' => 1, 'ID' => 1, 'KS' => 1, 'IA' => 1, 'LA' => 1, 'KY' => 1, 'ME' => 1, 'AR' => 1, 'HI' => 1, 'AK' => 1, 'GA' => 1, 'MI' => 1, 'AZ' => 1, 'CO' => 1, 'DC' => 1, 'AS' => 1, 'CA' => 1, 'IL' => 1, 'AL' => 1, 'CT' => 1 );

Now, your if statement can be reduced to the following, and you also can clearly see which states you have easily and can add/remove without scrolling through a great big statement:

if (! $states{$link->text}){ ... }

With all that said, I'd probably put the list of states into an array so it takes up less vertical space, and create the hash from the array instead:

my @state_abbrs = qw( IL MI AZ CT ... CO DC AS GA ... ); my %states = map {$_ => 1} @state_abbrs; if (! $states{$link->text}){ ... }

In reply to Re: Web Crawling using Perl by stevieb
in thread Web Crawling using Perl by ckj

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.