Yakup has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I'm trying to parse a config with brackets file on Linux, but I want to remove commented and blank lines from input, before the actual parsing. It is a part of the project, that is written in bash and python and must run on system, that is quite old (RHEL 5, with perl-5.8.8) and limited (no internet connectivity), so I can't use modules and have to "reinvent the wheel".

At the moment I have a code that works, but is wasteful (opening and closing file twice). I first remove unwanted comments and blanks with something like this

my $data; open my $FH, '<' , $config or die "Cannot open $config: $!\n"; open my $HF, '>' , $uncommented or die "Cannot open $uncommented: $!\n +"; $data = <$FH>; while (<$FH>) { next if /^\s*#|^$/; print $HF "$_"; } close $FH; close $HF;

Or with one liner

 perl -ne 'print unless /^\s*#|^$/' config > uncommented

And then I process the uncommented file

my $data; open my $FH, '<' , $config or die "Cannot open $config: $!\n"; local $/ = undef; $data = <$FH>; while ($data =~ m/\{([^}]*)\}/gx ) { print "$1\n"; } close $FH;

I'm not able to do both on one file opening. Either regex doesn't match anything, or comments and blanks are not stripped before parsing

I understand, that the problem lies in input record separator "$/", which must be set to undef for multiline pattern match, but for per line match it must be newline (removing comments and blanks)

I wonder, if there is any elegant way how to sequentially use both while opening file only once

Thank you in advance for any tips

Yakup

Replies are listed 'Best First'.
Re: Match pattern per line and after another one as multiline in single file opening
by kennethk (Abbot) on Feb 14, 2017 at 19:42 UTC
    You are right that there is unnecessary disk access going on. There are cute/clever ways to do this (e.g. piping in from grep), but the easiest way would be swap your print statement with a concatenation:
    open my $FH, '<' , $config or die "Cannot open $config: $!\n"; my $data = <$FH>; # Dump the line $data = ''; while (<$FH>) { next if /^\s*#|^$/; $data .= $_; } close $FH; while ($data =~ m/\{([^}]*)\}/gx ) { print "$1\n"; }
    Incidentally, you have a $data = <$FH>; peppered into your first chunk. Are you meaning to dump the first line of the file? I've included that behavior, but it smells buggy to me.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Match pattern per line and after another one as multiline in single file opening
by tybalt89 (Monsignor) on Feb 14, 2017 at 21:20 UTC
    my $data; open my $FH, '<' , $config or die "Cannot open $config: $!\n"; local $/ = undef; s/^\s*#.*\n|^\n//gm for $data = <$FH>; while ($data =~ m/\{([^}]*)\}/gx ) { print "$1\n"; } close $FH;

    Since you did not provide a test file, it's untested...

      TIMTOWDI, but I would suggest you replace:
      my $data; open my $FH, '<' , $config or die "Cannot open $config: $!\n"; local $/ = undef; s/^\s*#.*\n|^\n//gm for $data = <$FH>;
      with
      open my $FH, '<' , $config or die "Cannot open $config: $!\n"; local $/ = undef; my $data = <$FH>; $data =~ s/^\s*#.*\n|^\n//gm;
      The use of the for loop on a value you expect to be scalar confuses casual perusal, and you've declared the variable (my) at a different spot than where you initialize it. If you want to go compound, there's always
      (my $data = <$FH>) =~ s/^\s*#.*\n|^\n//gm;
      but that feels crowded to me. If I were actually writing this, I would do:
      my $data = do { local $/ = undef; open my $FH, '<' , $config or die "Cannot open $config: $!\n"; <$FH>; }; $data =~ s/^\s*#.*\n|^\n//gm;
      Slurping in a do loop keeps the filehandle tightly scoped and keeps that localization of the input file separator actually local. And note, even then, I personally prefer keeping the processing separate from the import.

      Update: Fixed typos; haukex++

      Update: Fixed typos; haukex++. Some days you just shouldn't post code.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Match pattern per line and after another one as multiline in single file opening
by Laurent_R (Canon) on Feb 14, 2017 at 22:34 UTC
    Hi,

    it's a bit difficult to be sure what you're trying to do without having some sample input data.

    Try perhaps this:

    open my $FH, '<' , $config or die "Cannot open $config: $!\n"; my $data = <$FH>; # removing a header line? or what? while (my $line = <$FH>) { next if /^\s*#|^$/; print "$1\n" while $data =~ m/\{([^}]*)\}/gx ; } close $FH;
    Untested because of the absence of test data.

      I read the description as he needs the [^}] to match through newlines, and so there needs to be a slurp before the regex. But your point about lack of test data is very appropriate.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Yeah, I may have missed or overlooked the sentence about multiline match and I really wondered why the OP code was localizing $/, but given that the OP speaks about a config file and given the regexes used for matching the data, I am still not really convinced that there is an actual need for multiline match.

        But it's impossible to say either way without the input data.

Re: Match pattern per line and after another one as multiline in single file opening
by Marshall (Canon) on Feb 15, 2017 at 00:31 UTC
    You say: I'm trying to parse a config with brackets file on Linux ,but you do not present an example file.

    It would be extremely helpful to show an example config file and also explain what information you are trying to get out of it?

    Generating a single string from the config file, which is then parsed, may or may not be appropriate. Parsing line by line may or may not be more appropriate.

    Below I demo a rather generic algorithm for parsing the sections within the file line by line. Without any spec, I just made up data for the input file. This code can be adapted to process a "generic {} file". The code allows for the idea of a "root" unnamed {section}.

    Do not mistake brevity for efficiency. Also, I have no idea what you want to do with the data within the {config} sections.

    There are of course many methods to implement this type of code.

    #!usr/bin/perl use strict; use warnings; use Data::Dumper; my %section_lines; while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines next if $line =~ /^\s*#/; # skip comments chomp $line; if ($line =~ m/\{([^}]*)\}/) { process_section($1); } else { # Could be in a "root" un-named section? Or # Could be some junk, not a comment, not a line within # a section? Shouldn't happen, but maybe it does? print STDERR "Skipping Illegal line: \'$line\'\n"; } } sub process_section { my $section_name = shift; # allow for a blank section data (no lines within it) # the existence of such a thing could have meaning? $section_lines{$section_name} = [] if (!$section_lines{$section_na +me}); while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines ?? next if $line =~ /^\s*#/; # skip comments ?? if ($line =~ m/\{([^}]*)\}/) # new section detected... { process_section($1); } else { $line =~ s/^\s*//; # trim leading space $line =~ s/\s*$//; # trim trailing space (inc EOL) # I have no idea of what processing is needed here. # This just adds a line to the section that is # being parsed. push @{$section_lines{$section_name}}, $line; } } } print Dumper \%section_lines; =This Program Prints: Skipping Illegal line: 'this a bogus line, not in a named section' Skipping Illegal line: 'is there a "root" un-named section possible?' $VAR1 = { 'section 2' => [], 'section 3' => [ 'xyzzy = 57', 'this line might mean something?' ], 'section 1' => [ 'a =2', 'b =something' ] }; =cut __DATA__ # Please show a "real" file here, just a guess... # this is comment this a bogus line, not in a named section is there a "root" un-named section possible? {section 1} a =2 b =something {section 2} # some comment embedded in section {section 3} # comment xyzzy = 57 this line might mean something?
Re: Match pattern per line and after another one as multiline in single file opening
by BillKSmith (Monsignor) on Feb 14, 2017 at 23:26 UTC
    Use the concept of your one-liner to do everything.
    #!perl -p use strict; use warnings; if ( /^\s*[#\n]/ or !s/.*{([^}]+).*$/$1/ ) { $_ = <>; last if !defined $_; redo; }

    Clearly not tested on real data.

    Bill
        In that view, I would use only the OP's second program. All comments can be removed with a single substitution: $data =~ s/^\s*#.*?\n//msg;
        Bill