in reply to My code sucks, please help me understand why.

It would help a lot to know how your input data looks like (a few lines of each file would be enough), and if PIDS or FIDS is the larger one.

I guess your logic problem (or one of them, at least) is this line:

if ($fid =~ /$pidCan/) {

Which interprets $pidCan as a regex, not just as a constant string to search for.

You can make that a literal search by writing if ($fid =~ /\Q$pidCan\E/) instead.

And yes, chances are that your program can be made much faster, but for that it would really help to know the structure of your input data.

There are also some stylistic weaknesses in your code - for example it's bad practice to declare all variables up front - that way they can be seen in the whole program. It's easier to avoid errors though if you keep the scope of each variable small. For example $result can and should be declared inside the loop:

foreach $pid (@pids) { my $result; $pid =~ /\|(.*)$/; $result = $1; push (@pidCans, $result); }

Or even better avoided completely:

foreach $pid (@pids) { $pid =~ /\|(.*)$/; push @pidCans, "$1"; }
Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^2: My code sucks, please help me understand why.
by mirage4d (Novice) on Oct 14, 2009 at 17:39 UTC

    Thanks for the quick response moritz. Sample lines listed below:

    Lines from PIDS (this is the smaller (~16k record) file:

    dn: cn=*****,ou=users,|810221480 dn: cn=*****,ou=users,|810039655 dn: cn=*****,ou=users,|810086196 dn: cn=*****,ou=users,|810008482 dn: cn=*****,ou=users,|810224329
    Lines from FIDS (larger, ~240k records):
    810000001;08/17/1957 810000002;12/02/1975 810000003;12/22/1982 810000004;11/01/1967 810000005;02/07/1981 810000006;12/27/1967 810000007;05/09/1981 810000008;05/14/1976 810000009;10/24/1981 810000010;11/17/1943

    I'm trying to match on the 810* ids. Thanks for your help.

      Since you can extract the IDs easily for each file, you can put the IDs from PIDS into a hash, and query that while iterating over FIDS:

      my %ids; while (my $line = <PIDS>) { chomp $line; my (undef, $id) = split /\|/, $line; $ids{$id} = 1; } while (my $line = <FIDS>) { my ($id, $rest) = split /;/, $line, 2; if ($ids{$id}) { print "Found id '$id' in line $line"; } }
      Perl 6 - links to (nearly) everything that is Perl 6.
        Thanks to both of you for your responses. I've implemented both solutions in addition to modifying the regex in my original script as recommended (thanks moritz), but still I get no matches. Really scratching my head over this one. I am beginning to wonder if there could be something amiss with my input that I am just not seeing. Any further suggestions would be much appreciated. Thanks again for your help.
      my @pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); push @pids, /\|(\d+)$/ while <$pids_fh>; } my $pids_pat = map qr/$_/, join '|', #map quotemeta, # We're only dealing with digits @pids; open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { print if /^$pids_pat;/; }

      If you need extra speed and your Perl is older than 5.10, change

      my $pids_pat = map qr/$_/, join '|', #map quotemeta, # We're only dealing with digits @pids;
      to
      use Regexp::List qw( ); my $pids_pat = Regexp::List->new()->list2re(@pids);

      5.10 already does the optimisation Regexp::List does.

      An alternative to me previous reply would be to use a hash instead of a regular expression
      my %pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); while (<$pids_fh>) { my ($pid) = /\|(\d+)$/ or next; ++$pids{$pid}; } } open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { my ($pid) = /^(\d+);/ or next; print if $pids{$pid}; }