in reply to Re: My code sucks, please help me understand why.
in thread My code sucks, please help me understand why.

Thanks for the quick response moritz. Sample lines listed below:

Lines from PIDS (this is the smaller (~16k record) file:

dn: cn=*****,ou=users,|810221480 dn: cn=*****,ou=users,|810039655 dn: cn=*****,ou=users,|810086196 dn: cn=*****,ou=users,|810008482 dn: cn=*****,ou=users,|810224329
Lines from FIDS (larger, ~240k records):
810000001;08/17/1957 810000002;12/02/1975 810000003;12/22/1982 810000004;11/01/1967 810000005;02/07/1981 810000006;12/27/1967 810000007;05/09/1981 810000008;05/14/1976 810000009;10/24/1981 810000010;11/17/1943

I'm trying to match on the 810* ids. Thanks for your help.

Replies are listed 'Best First'.
Re^3: My code sucks, please help me understand why.
by moritz (Cardinal) on Oct 14, 2009 at 17:56 UTC

    Since you can extract the IDs easily for each file, you can put the IDs from PIDS into a hash, and query that while iterating over FIDS:

    my %ids; while (my $line = <PIDS>) { chomp $line; my (undef, $id) = split /\|/, $line; $ids{$id} = 1; } while (my $line = <FIDS>) { my ($id, $rest) = split /;/, $line, 2; if ($ids{$id}) { print "Found id '$id' in line $line"; } }
    Perl 6 - links to (nearly) everything that is Perl 6.
      Thanks to both of you for your responses. I've implemented both solutions in addition to modifying the regex in my original script as recommended (thanks moritz), but still I get no matches. Really scratching my head over this one. I am beginning to wonder if there could be something amiss with my input that I am just not seeing. Any further suggestions would be much appreciated. Thanks again for your help.
        I don't understand what's wrong.

        It's clear that the example data you gave us doesn't give any matches, but if I modify it a bit I do get a match.

        Code:

        use strict; use warnings; use autodie; open my $pids, '<', 'pids'; my %ids; while (my $line = <$pids>) { chomp $line; my (undef, $id) = split /\|/, $line; $ids{$id} = 1; } close $pids; open my $fids, '<', 'fids'; while (my $line = <$fids>) { my ($id, $rest) = split /;/, $line, 2; if ($ids{$id}) { print "Found id '$id' in line $line"; } } close $fids;

        FIDS:

        810000001;08/17/1957 810000002;12/02/1975 810000003;12/22/1982 810000004;11/01/1967 810086196;02/07/1981 810000006;12/27/1967 810000007;05/09/1981 810000008;05/14/1976 810000009;10/24/1981 810000010;11/17/1943

        pids (unchanged):

        dn: cn=*****,ou=users,|810221480 dn: cn=*****,ou=users,|810039655 dn: cn=*****,ou=users,|810086196 dn: cn=*****,ou=users,|810008482 dn: cn=*****,ou=users,|810224329
        Perl 6 - links to (nearly) everything that is Perl 6.
Re^3: My code sucks, please help me understand why. (regex)
by ikegami (Patriarch) on Oct 14, 2009 at 17:46 UTC
    my @pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); push @pids, /\|(\d+)$/ while <$pids_fh>; } my $pids_pat = map qr/$_/, join '|', #map quotemeta, # We're only dealing with digits @pids; open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { print if /^$pids_pat;/; }

    If you need extra speed and your Perl is older than 5.10, change

    my $pids_pat = map qr/$_/, join '|', #map quotemeta, # We're only dealing with digits @pids;
    to
    use Regexp::List qw( ); my $pids_pat = Regexp::List->new()->list2re(@pids);

    5.10 already does the optimisation Regexp::List does.

Re^3: My code sucks, please help me understand why. (hash)
by ikegami (Patriarch) on Oct 14, 2009 at 17:54 UTC
    An alternative to me previous reply would be to use a hash instead of a regular expression
    my %pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); while (<$pids_fh>) { my ($pid) = /\|(\d+)$/ or next; ++$pids{$pid}; } } open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { my ($pid) = /^(\d+);/ or next; print if $pids{$pid}; }