Re^2: My code sucks, please help me understand why.

Thanks for the quick response moritz. Sample lines listed below:

Lines from PIDS (this is the smaller (~16k record) file:

dn: cn=*****,ou=users,|810221480
dn: cn=*****,ou=users,|810039655
dn: cn=*****,ou=users,|810086196
dn: cn=*****,ou=users,|810008482
dn: cn=*****,ou=users,|810224329
[download]

Lines from FIDS (larger, ~240k records):

810000001;08/17/1957
810000002;12/02/1975
810000003;12/22/1982
810000004;11/01/1967
810000005;02/07/1981
810000006;12/27/1967
810000007;05/09/1981
810000008;05/14/1976
810000009;10/24/1981
810000010;11/17/1943
[download]

I'm trying to match on the 810* ids. Thanks for your help.

Comment on Re^2: My code sucks, please help me understand why. Select or Download Code

Replies are listed 'Best First'.
Re^3: My code sucks, please help me understand why. by moritz (Cardinal) on Oct 14, 2009 at 17:56 UTC
Since you can extract the IDs easily for each file, you can put the IDs from PIDS into a hash, and query that while iterating over FIDS: `my %ids; while (my $line = <PIDS>) { chomp $line; my (undef, $id) = split /\\|/, $line; $ids{$id} = 1; } while (my $line = <FIDS>) { my ($id, $rest) = split /;/, $line, 2; if ($ids{$id}) { print "Found id '$id' in line $line"; } }` [download] Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l]
Re^4: My code sucks, please help me understand why. by mirage4d (Novice) on Oct 15, 2009 at 15:32 UTC
Thanks to both of you for your responses. I've implemented both solutions in addition to modifying the regex in my original script as recommended (thanks moritz), but still I get no matches. Really scratching my head over this one. I am beginning to wonder if there could be something amiss with my input that I am just not seeing. Any further suggestions would be much appreciated. Thanks again for your help.	[reply]
Re^5: My code sucks, please help me understand why. by moritz (Cardinal) on Oct 15, 2009 at 17:09 UTC
I don't understand what's wrong. It's clear that the example data you gave us doesn't give any matches, but if I modify it a bit I do get a match. Code: `use strict; use warnings; use autodie; open my $pids, '<', 'pids'; my %ids; while (my $line = <$pids>) { chomp $line; my (undef, $id) = split /\\|/, $line; $ids{$id} = 1; } close $pids; open my $fids, '<', 'fids'; while (my $line = <$fids>) { my ($id, $rest) = split /;/, $line, 2; if ($ids{$id}) { print "Found id '$id' in line $line"; } } close $fids;` [download] FIDS: `810000001;08/17/1957 810000002;12/02/1975 810000003;12/22/1982 810000004;11/01/1967 810086196;02/07/1981 810000006;12/27/1967 810000007;05/09/1981 810000008;05/14/1976 810000009;10/24/1981 810000010;11/17/1943` [download] pids (unchanged): `dn: cn=***,ou=users,\|810221480 dn: cn=*,ou=users,\|810039655 dn: cn=*,ou=users,\|810086196 dn: cn=*,ou=users,\|810008482 dn: cn=***,ou=users,\|810224329` [download] Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re^3: My code sucks, please help me understand why. (regex) by ikegami (Patriarch) on Oct 14, 2009 at 17:46 UTC
`my @pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); push @pids, /\\|(\d+)$/ while <$pids_fh>; } my $pids_pat = map qr/$_/, join '\|', #map quotemeta, # We're only dealing with digits @pids; open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { print if /^$pids_pat;/; }` [download] If you need extra speed and your Perl is older than 5.10, change `my $pids_pat = map qr/$_/, join '\|', #map quotemeta, # We're only dealing with digits @pids;` [download] to `use Regexp::List qw( ); my $pids_pat = Regexp::List->new()->list2re(@pids);` [download] 5.10 already does the optimisation Regexp::List does.	[reply] [d/l] [select]
Re^3: My code sucks, please help me understand why. (hash) by ikegami (Patriarch) on Oct 14, 2009 at 17:54 UTC
An alternative to me previous reply would be to use a hash instead of a regular expression `my %pids; { open(my $pids_fh, '<', $pids_qfn) or die("Can't open PIDS file \"$pids_qfn\": $!\n"); while (<$pids_fh>) { my ($pid) = /\\|(\d+)$/ or next; ++$pids{$pid}; } } open(my $fids_fh, '<', $fids_qfn) or die("Can't open FIDS file \"$fids_qfn\": $!\n"); while (<$fids_fh>) { my ($pid) = /^(\d+);/ or next; print if $pids{$pid}; }` [download]	[reply] [d/l]