Re^2: Piping many individual files into a single perl script

Replies are listed 'Best First'.
Re^3: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 14:27 UTC
You need to detect the end of each individual file, print your results for that file and reset the counts. See the explanation of `eof(ARGV)` in perlfunc: #!C:\Perl use strict; BEGIN{ @ARGV=map glob, @ARGV; } open(RES, ">>results.txt"); print RES "File Number A A% B B% Null Null%\n"; my $A = 0; #these three lines set my initial counts at zero my $B = 0; my $null = 0; my $filenum = 0; while( <> ){ chomp($_); if ($_ eq "stringa"){ $A++;} elsif ($_ eq "stringb"){ $B++;} else { $null++; } if( eof( ARGV ) ) { ## true after the end of each individual file my $popa = sprintf( '%.2f', $A / 1000 ); my $popb = sprintf( '%.2f', $B / 1000 ); my $popnull = sprintf( '%.2f', $null / 1000 ); my $filenum++; #Add one to my filenumber print RES "$filenum $A $popa $B $popb $null + $popnull\n"; $A = $B = $null = 0; ## Reset counts for the next file } } [download] As I mentioned above, if OS/X is a *nix-like system, you probably don't need the `@ARGV = map glob, @ARGV` as the shell will take care of that for you. (Though it probably won't do any harm.) Also, in your code you have several place where you do: `... my $var = ....; my $var = sprintf ... $var; ...` [download] If you are running with strict and warnings, you should be getting messages of the form: `"my" variable $var masks earlier declaration in same scope at ....`...don't ignore them, they are there for a purpose. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Piping many individual files into a single perl script by broomduster (Priest) on Sep 28, 2008 at 14:47 UTC
I was about to post a reply with similar suggestions... Now the only new things I can add are on a Mac the `BEGIN{ @ARGV=map glob, @ARGV; }` [download] is definitely not required. I also noticed that the OP is using `#!C:\Perl` as the shebang line. On a Mac this should be `#!/usr/bin/perl` for an Apple-supplied Perl, but will need a different path if it's a user-installed Perl (e.g., from MacPorts, or elsewhere)	[reply] [d/l] [select]
Re^5: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 15:30 UTC
I also noticed that the OP is using `#!C:\Perl` as the shebang line. I missed that++. That's a very strange choice for someone using a MAC (which doesn't have drive letters?), and wouldn't work on any *nix variant I'm aware of. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^3: Piping many individual files into a single perl script by graff (Chancellor) on Sep 28, 2008 at 15:22 UTC
I'm a little confused. You said you are running on macosx, but your code starts with: `#!C:\Perl` [download] That makes no sense, and it entails that you can only run the script with a command line like this: `perl path/file_name_of_script arg1 ...` [download] (where the "path/" part is only needed if the script is not in your shell's current working directory). I would use this as the initial "shebang" line: `#!/usr/bin/perl` [download] because macosx is really a unix OS, and in unix, perl is typically found in the /usr/bin/ directory; macosx definitely has perl in that location. With that as the shebang line, and doing the shell command "chmod +x file_name_of_script", the script becomes available as a shell command: `path/file_name_of_script arg1 ...` [download] where the "path/" part is only needed if your shell PATH variable does not include the directory where the script is stored. As for your question about iterating over a list of file names, a method that I find useful goes like this: the perl script expects as input a list of file names, loads those into an array, and then iterates over the array. At each iteration, if there's a problem with the file or its contents, issue a warning and skip on to the next file in the list; e.g.: #!/usr/bin/perl use strict; use Getopt::Long; my $Usage = "Usage: $0 [-p path] filename.list\n or: ls [path] \| $0 +[-p path]\n"; my $path = '.'; die $Usage unless ( GetOptions( 'p=s' => \$path ) and -d $path ); die $Usage if (( @ARGV and !-f $ARGV[0] ) or ( @ARGV==0 and -t )); # need file name args or pipeline input my @file_list = <>; # read all input as a list of file names chomp @file_list; # get rid of line-feeds for my $name ( @file_list ) { my $file = "$path/$name"; if ( ! -f $file ) { warn "input value '$file' does not seem to be a data file; ski +pped\n"; next; } if ( ! open( I, "<", $file )) { warn "open failed for input file '$file'; skipped\n"; next; } ... } [download] There are already very good shell command tools for creating a list of file names ("ls", "find"), and for filtering lists ("grep"), so I'm inclined not to rewrite those things in a perl script that is supposed to process a list of file names. The exception to that rule is when the script is really intended for a specific task that always involves a specific location and/or filter for getting its list of file names to work on, because in that case, I'd rather not have to repeat the selection process on the command line every time I run the script.	[reply] [d/l] [select]
Re^3: Piping many individual files into a single perl script by apl (Monsignor) on Sep 28, 2008 at 11:52 UTC
$A and $B are the running totals for all of the files. You either need to make them arrays (indexed by file), or you need to print the totals when you reach the end of a file (after which, you would reset the variables to zero).	[reply]
Re^3: Piping many individual files into a single perl script by blazar (Canon) on Sep 29, 2008 at 14:45 UTC
I like BrowserUk's solution below, except that I'd probably rewrite it (I mostly didn't like the chained `if`-`elsif`'s) in a manner similar to (untested:) `#/usr/bin/perl use strict; use warnings; use 5.010; BEGIN{ @ARGV=map glob, @ARGV } print "File Number A A% B B% Null Null%"; my $default = ''; # set to something sensible, the empty string seems + good. my @allowed = (qw/stringa stringb/, $default); my (%count, $filenum); while(<>) { chomp; $count{$_ ~~ @allowed ? $_ : $default}++; if (eof) { $filenum++; say "$filenum ", join ' ' => map { my $x=$count{$_}; $x, sprintf('%.2f', $x/1000) } @allo +wed; @count{@allowed}=(0) x @allowed; } } __END__` [download] I threw in some 5.10-isms in the course of doing so, but it wouldn't be terribly different with pre-5.10 exists. `--` ~~If you can't understand the incipit, then please check the IPB Campaign.~~	[reply] [d/l] [select]