ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Hello community,
I am running a test which prints the following output:
Consider the following output format:
-date- -version- -more-info-can-be-more-than-one-line- ------------------------------------------------------------------ status id -more-parameters- name ------------------------------------------------------------------ ... .... ... .... ------------------------------------------------------------------

Notice that its kind of a table with information.An example for this table:
date:17/4/2000 version: 4.1.0 -more info- -more info- ------------------------------------------------------------------ Status id not important info (two columns) name ----------------------------------------------------------------- Run 1234 @ ... foo1 Wait 54123 @ ... foo2 -----------------------------------------------------------------

I would like to write the two following functions which get 'id' number as argument:
1. If I have the current id "1220" argument, I would like to get an array of (id,status,name) of the line with the next minimal id (or just scalars, doesn't matter). In the above example, it would be (1234,Run,foo1) because there is no smaller number than 1234 and larger than 1220. If it doesn't have lines at all, it will be undefined.
2. (first way is more important to me) I would like to get a hash so the keys are id and the values are their status. maybe in this way it will be easier to get the first thing.

What I thought to do - I tried to combine regex and loops to iterate over the output and get the information I need, although I failed. I think regex are the key to solve this problem.
Looking for the cleanest way possible.
Hope that my question is clear (if not, I would glad to specify more).

Replies are listed 'Best First'.
Re: Parsing output with a special format -- switch and yardstick
by Discipulus (Canon) on Jun 05, 2018 at 08:49 UTC
    Hello ovedpo15

    Hoping i'm understandunding correctly the following statement:

    > because there is no smaller number than 1234 and larger than 1220

    You can use a switch (do something only if switch is on) and a yardstick/touchstone (do only if comparison gives something..). In loops generally exit condintions go first, but in this case the switch condition must be the first one, then all exit conditions as always.

    use strict; use warnings; my $arg = 1220; my $switch; my @result; # here the touchstone or yardstick # start the comparison with a reasonably big number $result[0] = 10 ** 6; while (<DATA>){ $switch = 1 if /^Status\s+id.*name$/; next unless $switch; next if /^[\-]+$/; chomp; my @fields = split /\s+/; if ( $fields[1] =~ /^\d+$/ and $fields[1] > $arg and $fields[1] < + $result[0] ){ @result = @fields[1,0,-1]; } } # reset if nothing found undef @result unless $result[1]; print join ',',@result; __DATA__ date:17/4/2000 version: 4.1.0 -more info- -more info- ------------------------------------------------------------------ Status id not important info (two columns) name ----------------------------------------------------------------- Run 1234 @ ... foo1 Err 1235 @ ... foo1 Wait 54123 @ ... foo2 -----------------------------------------------------------------

    L*

    PS i add here to answer your below questions:

    about do not start with a reasonable big number: know your data! is a very good principle. I cannot help here. You can use the biggest number perl can handle on you platform..

    About the @fields[1,0,-1] it is an array slices. Note the Perl is happy within negative indexes so, as said in the CB but also here for posterity, means the second, the first and the last elements.

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      It is a very nice solution, thank you.
      Is it possible, somehow to not to start with a reasonably big number?  $result[0] = 10 ** 6;
      I don't know the 'max' id it get.
      By the way, in  @result = @fields[1,0,-1]; how does it work? what is the '-1' in the end?
        ... @result = @fields[1,0,-1]; how does it work?

        Please see Slices in perldata.

        what is the '-1' in the end?

        It's an array index. (Update: See Subscripts in perldata.) In Perl, negative indices access array elements from the end of the array:

        c:\@Work\Perl\monks>perl -wMstrict -le "my @ra = qw(first second some other stuff penultimate ultimate); ;; printf qq{$_ } for @ra[ 1, 0, -2, -1 ]; " second first penultimate ultimate


        Give a man a fish:  <%-{-{-{-<

        will it be easier to find only the smallest id number after the current id number argument? so the function will return only id and not array.
        In the example above it will return only 1234. How should I implement it?
Re: Parsing output with a special format
by hippo (Archbishop) on Jun 05, 2018 at 08:10 UTC

    Part 2 is pretty easy so here's one approach.

    #!/usr/bin/env perl use strict; use warnings; my %status; while (<DATA>) { next unless /@/; my @fields = split; $status{$fields[1]} = $fields[0]; } use Data::Dumper; print Dumper (\%status); __DATA__ date:17/4/2000 version: 4.1.0 -more info- -more info- ------------------------------------------------------------------ Status id not important info (two columns) name ----------------------------------------------------------------- Run 1234 @ ... foo1 Wait 54123 @ ... foo2 -----------------------------------------------------------------

    This should give you enough to work on regarding point 1.

      I use the `system` commend.
      my $test_output = system($cmd);
      how can I iterate over it like you did with DATA.
      maybe I should use: my @test_output = system($cmd);?
        my $test_output = system($cmd);

        That won't work, since system returns the exit code and does not capture the command's output. I would recommend capture or capturex from IPC::System::Simple, which can give you an array of lines (my @lines = capture($cmd);). Otherwise, e.g. if the command's output is too large to be captured all at once, you could use a piped open, as I showed here along with other possible solutions.