Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Using expressions in arrays for pattern matching

by Mike_76 (Novice)
on Dec 01, 2003 at 16:16 UTC ( #311302=perlquestion: print w/replies, xml ) Need Help??

Mike_76 has asked for the wisdom of the Perl Monks concerning the following question:

I'm a newcomer to programming and have within the last few months been learning to use Perl. I've searched your archives and have honestly tried to figure this out on my own, but after a few days of banging my head I've given up.

I'm searching the current directory for files with a .RPT extension, then storing the names in an array. I then want to open each file as well as write to an OUTPUT file (with the same name but a .TXT extension.) While I have the INPUT file open I want to search on a pattern and write the matching pattern to the OUTPUT file. The expressions I want to match are stored in an array params, which is generated from the file params.txt in the current directory.

The code is below.

#!/apps/bin/perl use warnings; use strict; #search directory for files ending with .RPT and store the names in an + array my @files=glob("*.RPT"); my $i; my $x; #for each file in array open and search for string for $i (@files){ open INPUT, "$i" or die "Couldn't open the file. $!"; open OUTPUT, "> $i\.dat"; print OUTPUT "$i\n"; print OUTPUT "\n Parameter Value\n"; print OUTPUT "-------------------------\n"; while (<INPUT>){ #open file containing expressions to search for open FILE, "params.txt" or die "Couldn't open the file. $!"; my @params=<FILE>; for $x (@params) { if (/\s$x\s*\d*/){ $x=$&; print OUTPUT "$x\n"; } } } }
I'd really appreciate any help you can offer.

Replies are listed 'Best First'.
Re: Using expressions in arrays for pattern matching
by duff (Parson) on Dec 01, 2003 at 16:58 UTC

    Presumably the params.txt file doesn't change while you're iterating over the *.RPT files, so you should take that bit out of the loop.

    You also say

    I then want to open each file as well as write to an OUTPUT file (with the same name but a .TXT extension.)
    but in your code, you give the output file a .dat suffix. Oh, and you aren't checking the return value of your open for the OUTPUT filehandle.

    Make sure that if your @params array contains items with characters that are special in regular expressions that you mean for them to be used as such.

    Are you sure that your regular expression for the each @params is correct? You want to match exactly one whitespace character, followed by a single element of the @params array, optional whitespace, then optional numbers? Judging from the heading in your output, you should probably capture the parameter and those optional digits separately. Here's an untested rewrite:

    #!/usr/bin/perl use warnings; use strict; open PARAMS, "params.txt" or die "Couldn't open the file. $!"; my @params=<PARAMS>; close PARAMS; my @files = glob("*.RPT"); for my $i (@files) { open INPUT, "$i" or die "Couldn't open the file. $!"; open OUTPUT, ">$i.dat" or die "Couldn't write to $i.dat - $!"; print OUTPUT "$i\n\n"; print OUTPUT " Parameter Value\n"; print OUTPUT "-------------------------\n"; while (<INPUT>) { for my $x (@params) { next unless /\s($x)\s*(\d*)/; print OUTPUT "$1 $2\n"; # probably use printf() here } } close INPUT; close OUTPUT; }

    If the things in @params are just words (i.e., no regular expression special characters), you probably want to make a single regular expression out of them ahead of time and rewrite the inner loop like so:

    ... my $paramre = join '|', @params; ... while (<INPUT>) { next unless /\s($paramre)\s*(\d*)/; print OUTPUT "$1 $2\n"; } ...

    Of course, even if they contain regular expression special characters you might want to do this, but be really careful about it. :-)

      If the things in @params are just words (i.e., no regular expression special characters), you probably want to make a single regular expression out of them ahead of time and rewrite the inner loop like so:
      [ . . . snip . . . ]
      Of course, even if they contain regular expression special characters you might want to do this, but be really careful about it. :-)

      You should be careful even if they don't contain metacharacters.

      In this case, the whitespace helps to anchor the expression, but you shouldn't rely on that. For instance, what if you have both "foo" and "foo\t" in your array? (Which gets matched may depend on the order they appear in the expression.) Be especially wary of changing the rest of the regular expression or trying to re-use $paramre in another one.

      Also, you would need to add a /g modifier and loop over the matches to get the same behavior as the original.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: Using expressions in arrays for pattern matching
by blokhead (Monsignor) on Dec 01, 2003 at 16:57 UTC
    What exactly isn't working? Is it not printing any matches to the output file? Your matches are probably not succeeding due to the newlines at the ends of the patterns. Make sure you chomp the list of patterns from params.txt. Also, for efficiency I recommend reading in the list of patterns outside of the main loop -- you don't need to re-read params.txt each time. Something like this.. (untested)
    open my $params => "params.txt" or die $!; chomp (my @params = <$params>); close $params; for my $filename (glob "*.RPT") { open my $output => ">$filename.dat" or die $!; print $output "$filename\n\nParameter Value\n-----------\n"; open my $input => $filename or die $!; while (<$input>) { for my $param (@params) { print $output "$&\n" if /\s$param\s*\d*/ } } }

    blokhead

Re: Using expressions in arrays for pattern matching
by Art_XIV (Hermit) on Dec 01, 2003 at 16:59 UTC

    The following demonstrates a way that you can handle the multiple selection criteria:

    use strict; my @expressions = qw(fee fi fo fum); my $expression = join '|', @expressions; while (<DATA>) { chomp; print "$_\n" if /$expression/; } __DATA__ The project was foobar I really fumbled that one I need a vacation Finally! I found the bug This is a sample

    The pipes added in the join statement let you do 'or's in your regular expressions.

    As a side note, there is nothing wrong with globbing, but you may want to check out the File::Find module.

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: Using expressions in arrays for pattern matching
by podian (Scribe) on Dec 01, 2003 at 17:01 UTC
    I can point out two things for you to explore more:

    a) you are opening params.txt file again and again in the for loop. You should move it out.

    b) you are opening Two files (INPUT, and FILE) one after the other. So on which file the following IF statement operates?

     if (/\s$x\s*\d*/){

Re: Using expressions in arrays for pattern matching
by yosefm (Friar) on Dec 01, 2003 at 22:20 UTC

    I think the major issues are discussed. So I'll just add one important thing that could save you a lot of time in the future. Since you say you are a new programmer - you're lucky to get to know about this early and pick up good habits.

    What I'm talking about is coding readably. It might sound unimportant, but believe me, when your programs will start getting longer, even you won't be able to quickly remmember what you wanted to do in lines 20-35 after working for a while on line 315-377 - not to mention different files. If you'll work in a team you'll even cause more casualties.

    The first and foremost is to give meaningful names to variables. $i is customarily a loop counter, but if your loop gets longer, nobody will be able to tell that $i holds the file name without some serious reading, sometimes not even you. You don't have to use Java-like names like $fileNameVariableIUseTemporarily :) but it's a lot better to write:

    foreach my $param (@params) { ... } foreach my $file (@files) { ... }

    instead of

    foreach $x (@params) { ... } foreach $i (@files) { ... }

    Secondly - indentation. Ok, I can follow your small script here, but what if I get a code with 3-4 nesting levels over 40-50 lines? You'll soon start banging your head against the wall if your indentation will fail you...

    Even worse, you sometimes get a block of closing braces - like the last four lines of your script. In really big blocks I comment them so I'll know what ends what (just like good old BASIC where putting the counter name at the end of a FOR loop was standard syntax).

    Phew... pretty long. But you'll thank me later :) I had more than my fair share of unreadable code and I'm trying to save you the bother.

    And now for something completely different: why

    $x=$&; print OUTPUT "$x\n";
    when you can print OUTPUT "$&\n"; ?

    That's it. Have fun learning programming!

      All good points on style. I'll mention perltidy here, as it will do many of these things for you automagically. I like editing a largish script, paying no heed to indentation and such, running it through perltidy and having pretty code come out.

      thor

        I think, though, that it's just as important to be good with readability while coding as after you're done. I know I don't program perfectly, and I know I've caught quite a few errors by indenting as-I-type and such things.


        Who is Kayser Söze?
Re: Using expressions in arrays for pattern matching
by melora (Scribe) on Dec 01, 2003 at 21:57 UTC
    How about this? I know it's not fancy, but I did test it.
    #!/usr/bin/perl # file : mike.pl use warnings; use strict; #search directory for files ending with .RPT and store the names in an + array my @files=glob("*.RPT"); my $i; my $x; my $readline; #open file containing expressions to search for open FILE, "params.txt" or die "Couldn't open the file. $!"; my @params=<FILE>; close FILE; # remember to close what you open #for each file in array open and search for string foreach $i (@files){ open INPUT, "$i" or die "Couldn't open the file. $!"; my $thisout = substr($i, 0, index($i, ".")) . ".txt"; # form the n +ame of the output .txt file open OUTPUT, ">$thisout"; while ($readline = <INPUT>){ # for each line in the input fi +le, foreach $x (@params) { # for each parameter in the list chomp($x); # get rid of any newline nonsense that migh +t be there if ($readline =~ m/$x/){ # if the param is in the line, prin +t the line. chomp($readline); print OUTPUT "$readline\n"; } } } close INPUT; # close what you open close OUTPUT; # close what you open }
      Nice, but I have one nitpick:

      my $thisout = substr($i, 0, index($i, ".")) . ".txt"
      What about names like "my.file.rpt" (pretty common style on *nix)?

      Since we are already through with $i (let's call it $file from now on), we can do:

      $file =~ s/\.rpt$/.txt/i; #case insensitive open OUTPUT, ">$file";
        I like that better, thanks for the nitpick!
Re: Using expressions in arrays for pattern matching
by ysth (Canon) on Dec 02, 2003 at 00:51 UTC
    Where you have:
    open INPUT, "$i" or die "Couldn't open the file. $!";
    you may want to say open INPUT, "< $i", or better yet open INPUT, "<", $i instead in case you get files beginning with |, <, or >

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://311302]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2023-09-26 01:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?