One way to tidy things up a bit would be to stuff all of your if conditions into a hash, then do a single if check. First, I put the statement with all the states into a temporary script:
use warnings; use strict; use Data::Dumper; my $str = <DATA>; my %states = map {$_ => 1} $str =~ /'([A-Z]{2})'/g; print Dumper \%states; __DATA__ if($link->text ne 'AK' && $link->text ne 'KY' && $link->text ne 'AS' + && $link->text ne 'MA' && $link->text ne 'MI' && $link->text ne 'CO' + && $link->text ne 'DC' && $link->text ne 'GA' && $link->text ne 'IN' + && $link->text ne 'MD' && $link->text ne 'CT' && $link->text ne 'AR' + && $link->text ne 'ID' && $link->text ne 'IL' && $link->text ne 'CA' + && $link->text ne 'AL' && $link->text ne 'ME' && $link->text ne 'DE' + && $link->text ne 'GU' && $link->text ne 'FL' && $link->text ne 'IA' + && $link->text ne 'LA' && $link->text ne 'HI' && $link->text ne 'KS' + && $link->text ne 'AZ')
From the output, copy the following part and paste it back into the original script in the form of a hash:
'IN' => 1, 'FL' => 1, 'MD' => 1, 'MA' => 1, 'GU' => 1, 'DE' => 1, 'ID' => 1, 'KS' => 1, 'IA' => 1, 'LA' => 1, 'KY' => 1, 'ME' => 1, 'AR' => 1, 'HI' => 1, 'AK' => 1, 'GA' => 1, 'MI' => 1, 'AZ' => 1, 'CO' => 1, 'DC' => 1, 'AS' => 1, 'CA' => 1, 'IL' => 1, 'AL' => 1, 'CT' => 1
Original script:
... my %states = ( 'IN' => 1, 'FL' => 1, 'MD' => 1, 'MA' => 1, 'GU' => 1, 'DE' => 1, 'ID' => 1, 'KS' => 1, 'IA' => 1, 'LA' => 1, 'KY' => 1, 'ME' => 1, 'AR' => 1, 'HI' => 1, 'AK' => 1, 'GA' => 1, 'MI' => 1, 'AZ' => 1, 'CO' => 1, 'DC' => 1, 'AS' => 1, 'CA' => 1, 'IL' => 1, 'AL' => 1, 'CT' => 1 );
Now, your if statement can be reduced to the following, and you also can clearly see which states you have easily and can add/remove without scrolling through a great big statement:
if (! $states{$link->text}){ ... }
With all that said, I'd probably put the list of states into an array so it takes up less vertical space, and create the hash from the array instead:
my @state_abbrs = qw( IL MI AZ CT ... CO DC AS GA ... ); my %states = map {$_ => 1} @state_abbrs; if (! $states{$link->text}){ ... }
In reply to Re: Web Crawling using Perl
by stevieb
in thread Web Crawling using Perl
by ckj
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |