GroundZero has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to avoid ugly code. I have a working solution below and what I wish I could make my doce look like commented out. I am including a sample input file and the expected output. For the application I am writing there will be up to thousands if these types of flags and I am looking far a better way to parse the data. Any help is much appreciated.
INPUT FILE = # comment # comment ON Mon, Tues, Fri, Sat # comment other stuff EXPECTED OUTPUT = $on = Mon,Tues,Fri,Sat CODE = #!/usr/bin/perl -w $infile = "test.txt"; $linecounter = 0; $inon = 0; $hold_on = ""; open(INFILE, $infile) || die "Can't open $infile : $!\n"; MAIN:while(<INFILE>) $linecounter++; #printf("File Line %s = %s",$linecounter, $_); if ( /^\*/ || /^\#/ ) # Get rid of useless comments in file. { next MAIN; } chop; if ( /^on/i || $inon eq "1" ) { undef($tmpon); undef(@junk); if ( $inon eq "0" ) { @junk = split(' ', $_, 2); $tmpon = $junk[1]; $tmpon =~ s/\s//g; } else { $tmpon = $_; } $len = length($tmpon) - 1; $last_char = substr($tmpon, $len, 1); #printf("Last Char = %s\n", $last_char); if ( $last_char eq "," ) { $hold_on = "$hold_on$tmpon"; $inon = 1; next MAIN; } else { $on = "$hold_on$tmpon"; $inon = 0; $hold_on = ""; next MAIN; } } #other code printf("\$on = %s\n", $on); WISHFUL CODE = #$infile = "test.txt"; #$linecounter = 0; #open(INFILE, $infile) || die "Can't open $infile : $!\n"; #MAIN:while(<INFILE>) # $linecounter++; # printf("File Line %s = %s",$linecounter, $_); # if ( /^\*/ || /^\#/ ) # Get rid of useless comments in file. # { # next MAIN; # } # chop; # if ( /^on/i ) # { # hold_on = ""; # undef(@junk); # @junk = split(' ', $_, 2); # $hold_on = $junk[1]; # $len = length($hold_on) - 1; # $last_char = substr($hold_on, $len, 1); # while ( $last_char eq "," ) # { # $read_line = readln(<INFILE>); # $read_line =~ chop($read_line); # $hold_on = "$hold_on$read_line"; # $len = length($hold_on) - 1; # $last_char = substr($hold_on, $len, 1); # } # $hold_on =~ s/\s//g; # } # #other code #printf("\$on = %s\n", $on);

Replies are listed 'Best First'.
Re: How do you read the next line in a file while in a loop?
by blakem (Monsignor) on Aug 31, 2001 at 02:36 UTC
    Perhaps something like this is what you want......
    #!/usr/bin/perl -w use strict; my (%options,$curroption); while(<DATA>) { # read from __DATA__ filehandle below chomp; # get rid of trailing newlines next if /^\s*#/; # skip if it the first nonspace char is '#'. # set the curroption if the first item in the line is in all caps $curroption = $1 if s/^([A-Z]+) //; # make $currroption a hash key whose value is an array ref containin +g # the various options we found. push(@{$options{$curroption}},$_) for (split(/,\s*/)); } # print the datastructure which is a HoA (Hash of Arrays); for my $option (keys %options) { my @vals = @{$options{$option}}; print "$option -- ", join (',',@vals), "\n"; } __DATA__ # comment # comment ON Mon, Tues, Fri, Sat # comment other stuff
    Output:
    ON -- Mon,Tues,Fri,Sat,other stuff

    -Blake

      Nice answer, ++blakem! Let me make some small remarks - your code has slight differences to the specifications ...

      • next if /^\s*[#*]/; # comments start with # or *
      • $curroption = $1 if s/^([A-Z]+) //;GroundZero is matching the option case insensitive - but this would break your code and you would have to resort to the comma-at-end-of-line solution suggested by grinder. As I understood the problem, the options (like on, except, ...) are known beforehand. In that case your elegant solution is easily fixable:
        # at top of program my @options = qw/on except some more options/; my $opt_string = join '|', @options; my $opt_pattern = qr/^($opt_string)\s+/i; # instead of your line $curroption = $1 if s/$opt_pattern//;
        and everything should be fine - as long as the option keywords are not allowed as values for any of the options.
      • push accepts a list: push @{$options{$curroption}}, split(/,\s*/);
      • Instead of the for loop I'd use a while (each) construct:
        while (my ($option, $val_ref) = each %options) { print "$option -- ", join (',',@$val_ref), "\n"; }

      -- Hofmator

      Thanks Blake, I like your code it looks really close to what I want, but I don't want ', other stuff' it would be another section of data. It would be like..
      ON Mon, Tues, Fri EXCEPT Sun, Mon And I would want ouput like on = Mon,Tues,Fri execpt = Sun,Mon
      Maybe I am not explaining myself well.
        Have you tried replacing the data after __DATA__ with the data you have provided here? I think it already works that way....

        -Blake

Re: How do you read the next line in a file while in a loop?
by grinder (Bishop) on Aug 31, 2001 at 02:50 UTC
    Wow, what an awesome amount of code! I don't think you've sat down and figured out the algorithm. What do you really want the script to do? The way I understand it (and I may have misinterpreted it, but bear with me)... You have a text file that contains data and comments. You want to strip out the comments. The data you are interested in is introduced by the word "on" (case-insensitive). It may span several lines, if so, a trailing comma indicates that there's more to get on the subsequent line.

    Now here's that part that you haven't thought through... how are you going to know when to stop? The way I see it is that you grab the line with "on" and isolate the part that interests you. Once you have this, now, and for any subsequent line you read, if what you have ends in a comma, then you want to get the next line. Eventually, you will append a line that does not have a trailing comma, in which case you stop.

    Once you know how things are supposed to behave, the code usually follows naturally.

    #! /usr/bin/perl -w use strict; my $want_next_line = 0; my $on = ''; while( <DATA> ) { chomp; # never use chop s/\s*#.*$//; # discard comments next unless $_; # go to next line if nothing left if( /^on (.*)$/i ) { $on = $1; } elsif( $want_next_line ) { $on .= $_; } $want_next_line = $on =~ /,$/ ? 1 : 0; } $on =~ s/\s//g; print "\$on = $on\n"; __DATA__ INPUT FILE = # comment # comment ON Mon, Tues, Fri, Sat # comment other stuff

    As you can see, if you can state the problem clearly, the code usually falls out nicely.

    --
    g r i n d e r
      Thanks Grinder, This is exactly what I was looking for. Although I was probally using fuzzy logic. i.e. I had thought out the algorithm but did not code it as clearly. Thanks again for your help.
Re: How do you read the next line in a file while in a loop?
by demerphq (Chancellor) on Aug 31, 2001 at 05:41 UTC
    Hi. If I get you what you want to do is find a key in a file (in this case on) which has various comma/space delimited values spread over possibly many lines? If you can define where your data must end (as it seems you can from the info you provided) then you basically just cat the lines together eg: (_ are spaces!)

    this_is_part_of_the_data__
    so_is_this, #not a quote,
    me_too.__And_there_are_lots_of_endings
    musn't_forget_me!
    on Mon______,______Tues__,___
    Weds,
    #comment
    *different comment
    Fri
    #comment

    turns into:
    this_is_part_of_the_data__ so_is_this, #not a quote,me_too.__And_there_are_lots_of_endings
    musn't_forget_me!
    on Mon______,______Tues__,___ Weds, Fri

    until you have what can only be the end of a line, in this case one that does not match ,\s*$ Then you see if the beginning matches, then you split out the spaces and comma and use the results as you need.

    It was a bit ambiguous if you had lots of sequences like this, each with a different key or not. So I assumed you did.
    I take it you come from a c++ background?
    :-)

    Yves

    #!/usr/bin/perl -w use strict; use warnings; my $text=<<'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'; INPUT FILE = # comment # comment ON Mon, Tues, Fri, Sat # comment other stuff EXPECTED OUTPUT = $on = Mon,Tues,Fri,Sat XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX my %hash=(on=>undef, people=>undef); my $line=""; #open IN,$0; # uncomment me and replace the filehandle with SCAN:while(<DATA>) { # IN and it still works! next if /^\s*[*#]/; # line comments begin with * or # chomp; $line.=$_; # continue the line next if /,\s*$/; # not done my $found; KEY:foreach my $key (keys %hash) { if (!defined($hash{$key}) && $line=~/^\s*$key\s*/i) { $found=$key; last KEY; } } next SCAN unless $found; $line=~s/^\s*$found\s*//i; $hash{$found}=[split(/\s*,\s*/,$line)]; # split away $line=""; } # make printing easier $"=","; $,=$\="\n"; print "$text--OUTPUT--\n"; $"="','"; foreach my $key (keys %hash) { next unless $hash{$key}; print "#For Key '$key'"; print "my \@$key=qw('@{$hash{$key}}');\n"; } print "Hope this helps\n","Yves\n",":-)\n"; __DATA__ # comment # comment ON Mon, Tues, Fri, Sat # comment #INPUT FILE = ## comment ## comment People Mary, Bill, Jenny, Petra , Isolde, Joe #ON Mon, Tues, #Fri, #Sat ## comment #WISHFUL CODE = ##$infile = "test.txt"; ##$linecounter = 0; ##open(INFILE, $infile) || die "Can't open $infile : $!\n"; ##MAIN:while(<INFILE>) # ## $linecounter++; ## printf("File Line %s = %s",$linecounter, $_); ## if ( /^\*/ || /^\#/ ) # Get rid of useless comments in file. ## { ## next MAIN; ## } ## chop; ## if ( /^on/i ) ## { ## hold_on = ""; ## undef(@junk); ## @junk = split(' ', $_, 2); ## $hold_on = $junk[1]; ## $len = length($hold_on) - 1; ## $last_char = substr($hold_on, $len, 1); ## while ( $last_char eq "," ) ## { ## $read_line = readln(<INFILE>); ## $read_line =~ chop($read_line); ## $hold_on = "$hold_on$read_line"; ## $len = length($hold_on) - 1; ## $last_char = substr($hold_on, $len, 1); ## } ## $hold_on =~ s/\s//g; ## } ## #other code # ##printf("\$on = %s\n", $on);
Re: How do you read the next line in a file while in a loop?
by traveler (Parson) on Aug 31, 2001 at 02:29 UTC
    Maybe you are missing something or maybe I am. To read a line from INFILE in Perl use $line = <INFILE>; You don't need the readln() in your sample just use <INFILE>.

    HTH --traveler

      Thanks, traveler. I maybe missing something but I don't I want to load a file that could be 50 MB into a string. Am I thinking incorrectly here?
        You are not reading the entire file. <> reads a line at a time just as it does in your while. (Perhaps that is why it is called the "line input operator" <grin>) In array context it can read the entire file into an array, but in scalar context it reads a line. It looked as though your file was multiple lines (based on seeing your while and your wished-for readln). If it is, then this should work.

        HTH, --traveler