GoldfishOfDoom has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks,

I am completely new (apologies in advance for any politically incorrect terminology) to PERL and am trying to write a script that searches a file for a pattern then generates a new file based on a specific match from each pattern.

Using the match from the first file, I need to search a second file for a pattern that contains that match; however, since there are two files and two while statements, the variable that I generated using the match in the first while statement must stay in its own loop. Is there anyway to use the first REGEX match to generate the second REGEX statement?

Here's a sample of the code. I'm sure there are a ton of ways to clean this up, but at the moment, my biggest concern is finding a way to use the match from the first while's REGEX in the second while and REGEX. Any help is greatly appreciated. I've only been working with PERL for two days (with no previous programming experience) so this is all relatively new to me.

use warnings; use strict; # Variables my $pkt_file = $ARGV[0]; my $dat_file = $ARGV[1]; # Read data file open (IN_FILE, "<", $pkt_file) or die "ERROR Opening $pkt_file (Error += $!)\n"; print "Opening: $pkt_file for read...\n"; # Search IN_FILE for Packet Headers while (<IN_FILE>) { my $line_in = $_; if (($line_in =~ /(\w+) (\w+)/)) { open (OUT_FILE, ">>", "$2_tlmval.txt") or die "ERROR Opening + file (ERROR = $!)\n"; print "Opening $2_tlmval.txt\n"; print OUT_FILE " \n"; print OUT_FILE "$2:\n"; #prints CSTOL Label print OUT_FILE " \n"; my $pkt = "$2"; my $out_file = "$2_tlmval.txt"; # Close OUT_FILE close(OUT_FILE); print "Closing: OUT_FILE\n"; #Open data_list file for read open (DAT_FILE, "<", $dat_file) or die "ERROR Opening $dat_f +ile (Error = $!)\n"; print "Opening: $dat_file for read...\n"; # Reopen OUT_FILE for write open (OUT_FILE,, ">>", $out_file) or die "ERROR Opening $out +_file (Error = $!)\n"; print "Re-opening: $out_file for write...\n"; #Search DATA_LIST file while (<DAT_FILE>) { if (($line_in =~ /($pkt) (\w+) = (\d+)/)) { print OUT_FILE "check $pkt $2 vs $3\n"; } } } # Close IN_FILE close(IN_FILE); print "Closing: IN_FILE\n"; #Close OUT_FILE close (OUT_FILE); print "Closing: OUT_FILE\n"; # Close DAT_FILE close(DAT_FILE); print "Closing: DAT_FILE\n";
  • Comment on How to use REGEX Match Variable from one 'While' String in Another 'While' String
  • Download Code

Replies are listed 'Best First'.
Re: How to use REGEX Match Variable from one 'While' String in Another 'While' String
by AnomalousMonk (Archbishop) on May 07, 2014 at 02:53 UTC

    In addition to the changes suggested by AnonyMonk above, I find it is wise to capture regex 'capture' variables and get them nailed down as soon as possible. This is because the evaluation of any further regex will probably cause the contents of these variables to... well, vary. (You say, "But I'm not using any other regex and never will." Famous Last Words. Nail them down. And, in fact, you do have another regex, the one in the second, nested while-loop. If you were ever to add a statement after the second while-loop that made reference to the  $1 $2 $3 $n   capture variables expecting that they would retain the values they had after execution of the first regex, you would be unpleasantly surprised. Nail them down.)

    Maybe something like this. (Since you don't seem to use  $1 anywhere, I leave it out of account.) (Note the  /g modifier on the  m//xmsg regexes.) (Untested)

    my $pkt_file = ...; my $dat_file = ...; ... open (my $fh_pkt, '<', $pkt_file) or die "ERROR Opening $pkt_file (Err +or = $!)\n"; ... while (defined(my $line_in = <$fh_pkt>)) { if (my ($pkt) = $line_in =~ m{ \w+ \s+ (\w+) }xmsg) { my $out_file = "${pkt}_tlmval.txt"; open (my $fh_out, '>>', $out_file) or die "ERROR Opening '$out +_file' (ERROR = $!)\n"; ... open (my $fh_data, '<', $dat_file) or die "..."; while (defined(my $line_in = <$fh_data>)) { if (my ($foo, $bar) = $line_in =~ m{ \Q$pkt\E \s+ (\w+) \s ++ = \s+ (\d+) }xmsg) { print $fh_out "check '$pkt' '$foo' vs '$bar'\n"; } } } }

    Note that the  $line_in lexical variable in the nested while-loop is guaranteed to be completely separate and isolated from the lexical of the same name in the outer loop. Even so, I think I would prefer to use different names for these two variables just for the sake of self-documentation and maintainability.

      AnomalousMonk,

      Thank you for the additional comments. I'm always looking on ways to improve and make things flexible for any 'unexpected' changes or additions. This was extremely helpful in cleaning up my script.

      I appreciate you taking the time to respond.

Re: How to use REGEX Match Variable from one 'While' String in Another 'While' String
by Anonymous Monk on May 06, 2014 at 22:27 UTC
    Is there anyway to use the first REGEX match to generate the second REGEX statement?

    Yes, there is, and I see you're already doing this in your regular expression /($pkt) (\w+) = (\d+)/. If, for example, $pkt is "abc.def", the pattern will be the equivalent of /(abc.def) (\w+) = (\d+)/. If you don't want the dot and any other characters to be interpreted as special regex characters, you can write your regex as /(\Q$pkt\E) (\w+) = (\d+)/, and the previous example regex becomes /(abc\.def) (\w+) = (\d+)/ (see quotemeta).

    If this isn't working, after a first glance I will venture a guess as to why: your inner while loop is matching against $line_in instead of the current line from DAT_FILE, is that correct? If you want to match against the current line from DAT_FILE, it's enough to change that line to if (/($pkt) (\w+) = (\d+)/) {

    If that's not the problem, it would help if you could provide more information on your question: some sample input, the desired program output, any error messages you might be getting, and a description of what the program is supposed to be doing and how that's different from what it's actually doing.

    As you said there are other ways your code could be cleaned up (for example, it looks like you're opening and closing the file "$2_tlmval.txt" as OUT_FILE twice in a row, when you could just keep the file open), but first things first :-)

      Ah! Much easier than I was expecting. Thank you very much for the help! Seems like removing the $line_in did the trick. The regex match only had alphanumeric characters and underscores, so special characters wasn't an issue.

      I also removed the extra open and close of the file. Thank you for the suggestion! I really appreciate you taking the time to respond.

        The regex match only had alphanumeric characters and underscores...

        If your $pkt_file doesn't contain regular expressions, it's probably better to use \Q$pkt\E, because that'll prevent your program from doing unexpected things if $pkt does someday for whatever reason contain special characters. Similar to what AnomalousMonk wrote below about $1 $2 $3 etc. it's a "future-proofing" measure.