Searching Line by Line Multiple Times

Dr.Avocado has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Searching Line by Line Multiple Times by Joost (Canon) on Aug 01, 2007 at 19:43 UTC
This is just dangerous and wrong: `open(SuitDoc,<STDIN>) \|\| die "Cannot open file";` [download] Why it's wrong: it evaluates STDIN in list context so all lines in STDIN get appended as parameters to open(). It's also the wrong approach if you want to iterate over the supplied list of files multiple times, since usually you can't seek back on STDIN. Why it's dangerous: if STDIN contains something like ">/etc/passwd" or any other "magic" character it can cause the script to overwrite files or execute unsuspected programs. Use the 3 argument form of open to be sure what you're doing and check if the open() call succeeds: `open my $filehandle,"<",$filepath or die "Can't open $filepath for reading: $!"` You could store the content of STDIN in an array and iterate over that multiple times, but it's going to be quicker to check each line for each kind of suit and store the counts for each suit in 4 variables (or hash values) and print out the results after you've read all files. It only takes a couple of bytes to store all that info and you only have to read through all input files 1 time instead of 4. Note that disk IO is going to be by far the limiting factor in performance for this kind of task. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^2: Searching Line by Line Multiple Times by ikegami (Patriarch) on Aug 02, 2007 at 02:22 UTC
You're wrong in saying it's wrong. It's in scalar context. Ugly? yes. Dangerous? No more than using `<>`. Despite that, your advice is good. `>perl -e"print prototype('CORE::open') *;$@` [download]	[reply] [d/l] [select]
Re: Searching Line by Line Multiple Times by Limbic~Region (Chancellor) on Aug 01, 2007 at 23:11 UTC
Dr.Avocado, I am not commenting on your existing implementation as others have already done so. There are typically 3 ways that you would accomplish this: Go through the file once printing any line matching any suite Go through the file once pushing each matching line into an appropriate buffer (4 in your case) and then processing each buffer Loop through the file as many times as you have things to match While I think it is likely a bad idea, you seem to want to know how to do number 3. See seek. Cheers - L~R	[reply]
Re: Searching Line by Line Multiple Times by snopal (Pilgrim) on Aug 01, 2007 at 19:48 UTC
After going through your <STDIN> handle once, you are now at the EOL position, meaning no more data can be read from your handle. You would be better off reading <STDIN> contents into an ARRAY and then looping through that ARRAY in your foreach.	[reply]
Re: Searching Line by Line Multiple Times by liverpole (Monsignor) on Aug 02, 2007 at 03:58 UTC
Hi Dr.Avocado, Another suggestion for simplifying code ... Instead of: `my @suits = ("hearts","clubs","spades","diamonds");` [download] You may find qw useful: `my @suits = qw(hearts clubs spades diamonds);` [download] I've often found qw to be a big help in writing one-time use (ie. "throwaway") code, when applying one or more algorithms to a set of files: `use strict; use warnings; # Set @files to the result of doing something like # "ls -1" in Linux, or DIR/b in Windows (DOS) # my @files = qw( a.txt b.txt c.txt d.txt e.txt f.txt ); foreach (@files) { do_something_with_this_file($_); }` [download] s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply] [d/l] [select]
Re: Searching Line by Line Multiple Times by graff (Chancellor) on Aug 02, 2007 at 02:16 UTC
I certainly don't disagree with the previous replies -- they are all true -- but my suggestions are a little different... Instead of this (which is considered harmful): `open(SuitDoc,<STDIN>) \|\| die "Cannot open file"; ... while ($line = <SuitDoc>) { ...` [download] you should just go with the standard idiom: `while (<>) { # read from file name(s) in @ARGV or from pipe ...` [download] Apart from that, you've got your nesting the wrong way around. The outer loop should be over the lines that you read from your input stream, and the inner loop (applied to each line) should be over your patterns. But actually, you don't need an inner loop -- you can let the regex do most of the work: `use strict; # get into this habit my $suits = 'hearts\|clubs\|spades\|diamonds'; while (<>) { if ( /($suits)/ ) { print $1; } }` [download]	[reply] [d/l] [select]
Re: Searching Line by Line Multiple Times by akho (Hermit) on Aug 02, 2007 at 13:19 UTC
`use strict; my %suits; while (<>) { $suits{$1}++ if /(hearts\|clubs\|spades\|diamonds)/; } print ($_ . "\n") x $suits{$_} for (keys %suits);` [download] All criticism above is extremely useful. I'd also like to add that `print` stringifies its arguments anyway, so `"$suit"` is useless; it may also be harmful later in the development cycle.	[reply] [d/l] [select]