Large File Parsing

RobertCraven has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Large File Parsing by jwkrahn (Abbot) on Jan 03, 2010 at 04:25 UTC
`my @items; foreach my $key (keys %hash){ push(@items,$key); }` [download] Or simply: `my @items = keys %hash;` [download] `my $rxMatchItems; { local $" = q{\|}; $rxMatchItems = qr{(?:@items)}; }` [download] Or simply: `my $rxMatchItems = do { local $" = q{\|}; qr{(?:@items)} };` [download] Because your `%hash` is empty your pattern match becomes: `$ perl -le'my @items; my $rxMatchItems = do { local $" = q{\|}; qr{(?:@ +items)} }; print $rxMatchItems' (?-xism:(?:))` [download] And the pattern `(?-xism:(?:))` will match everything.	[reply] [d/l] [select]
Re^2: Large File Parsing by Anonymous Monk on Jan 03, 2010 at 05:36 UTC
`my $rxMatchItems = do { local $" = q{\|}; qr{(?:@items)} };` Oh noes :) `my $rxMatchItems = join '\|', map quotemeta, @items; $rxMatchItems = qr/$rxMatchItems/;` [download]	[reply] [d/l] [select]
Re^3: Large File Parsing by johngg (Canon) on Jan 03, 2010 at 11:19 UTC
I'm guessing from the variable name and the use of quoting constructs that the OP grabbed that bit of code from one of my solutions, probably one where `@items` contained values known not to need quotemeta'ing. Invariable application of quotemeta without any consideration of whether it is necessary is just another form of cargo cult programming. I don't think we can tell from the OP's code whether it is required or not. Even if it is required, the do block construct is as valid as using join. `my $rxMatchItems = do { local $" = q{\|}; qr{(?:@{ [ map quotemeta, @items ] })}; };` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re: Large File Parsing by educated_foo (Vicar) on Jan 03, 2010 at 06:44 UTC
These lines, especially the last, clearly show that you have copy-pasted something you don't understand: `use warnings; use strict; use Data::Dumper;` [download] Not to mention the fact that your script doesn't even run -- where is `%hash` defined?	[reply] [d/l] [select]
Re^2: Large File Parsing by afoken (Chancellor) on Jan 03, 2010 at 09:25 UTC
Please don't read the previous posting as "using strict and warnings is nonsense", the opposite is true: Always use strict and warnings (except in rare situations like Perl golf). The use of `Data::Dumper` is nonsense here, and it really looks like cargo cult. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^3: Large File Parsing by Jenda (Abbot) on Jan 03, 2010 at 11:32 UTC
You always add in `use Data::Dumper;` whenever you want to debugprint a datastructure and remove it as soon as you remove all debug prints that need it? Really? I quite often leave it there knowing that sooner or later I'll need it again. Sometimes the holly war against cargo culting is a bit cargo cultish. Jenda Enoch was right! Enjoy the last years of Rome.	[reply] [d/l]
Re^4: Large File Parsing by RobertCraven (Sexton) on Jan 03, 2010 at 23:24 UTC
Re^4: Large File Parsing by educated_foo (Vicar) on Jan 03, 2010 at 18:21 UTC
Re^5: Large File Parsing by Jenda (Abbot) on Jan 03, 2010 at 21:15 UTC
Re^2: Large File Parsing by RobertCraven (Sexton) on Jan 03, 2010 at 23:20 UTC
The hash is populated from a DB.	[reply]
Re: Large File Parsing by Marshall (Canon) on Jan 03, 2010 at 23:20 UTC
I have to parse a 9GB textfile, I only want to keep lines containing certain strings (UniProt IDs, like P40303 or Q99436). I would think that the first thing is to decide whether you even need to write any kind of program or not (Perl or otherwise)! I figure you are on a Unix type machine. There is a standard program that does what you want called "grep". Type "man grep", "man egrep" at the command line to get some hints. "grep P40303 *.datafile" will output all lines containing P40303 in all files ending in ".datafile". But if you must, here is some Perl code... `#!/usr/bin/perl -w use strict; my @items = qw (P40303 Q99436 X1234 W9765543); my $regex = join ("\|",@items); print $regex; # to see what this does # put something like this in your "grep" # P40303\|Q99436\|X1234\|W976554 while (<>) { print if m/$regex/; } __END__ Perl 5.10 is pretty smart. I think that the /o option is not necessary here. I don't think more complex syntax's are either.` [download]	[reply] [d/l]
Re: Large File Parsing by Anonymous Monk on Jan 03, 2010 at 04:15 UTC
The runtime is endless, could anyone recommend me a better way? Probably because your regex is nonsense	[reply]
Re^2: Large File Parsing by RobertCraven (Sexton) on Jan 03, 2010 at 23:14 UTC
Harsh, but helped	[reply]