Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

String parsing

by hotshot (Prior)
on Jan 05, 2004 at 10:22 UTC ( [id://318803]=perlquestion: print w/replies, xml ) Need Help??

hotshot has asked for the wisdom of the Perl Monks concerning the following question:

Hello all!

I have a variable that can hold one of the followings strings:
"OFF" "SUCCESS: (abc)" "ERROR(1): disk number (27) crashed at (11:03)" "WARNING(1): system is rebooting"
These strings are output generated from a script I run. As you can see, the first word is the status itself, in the forst parenthesis (before the colon) is the exit status, that not always exist, and after the colon is a string with variables in parenthesis.
I need to extract the status, exit status (if exists), and variables (if exists), and print to display a new string with placeholders for the variables taken from the output above.
How can I easily extract all I need from the string outputed by the script (a regexp or something), I need at the end an array holding the status followed by the exit status in the first entry (for example: OFF, ERROR2, WARNING1), and the variables in the next entries, for example:
@neededResultEx1 = ('OFF'); @neededResultEx2 = ('SUCCESS', 'abc'); @neededResultEx3 = ('ERROR1', '27', '11:03');

Replies are listed 'Best First'.
Re: String parsing
by Abigail-II (Bishop) on Jan 05, 2004 at 10:54 UTC
    #!/usr/bin/perl use strict; use warnings; $" = ", "; while (<DATA>) { /^(\w+)(?:[(](\d+)[)])?:?/g or next; my ($status, $exit) = ($1, $2); my (@vars) = /\G[^(]*[(]([^)]*)[)]/g; print "Status: $status; "; print "Exit: $exit; " if defined $exit; print "Variables [@vars]" if @vars; print "\n"; } __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting Status: OFF; Status: SUCCESS; Variables [abc] Status: ERROR; Exit: 1; Variables [27, 11:03] Status: WARNING; Exit: 1;


Re: String parsing
by Zaxo (Archbishop) on Jan 05, 2004 at 10:54 UTC

    Your data lines are different enough that each type needs its own parser. A hash of coderefs ("dispatch table") will do that nicely,

    my $parser = { OFF => sub {()}, SUCCESS => sub { local $_ = shift; /\((\w*)\)/ }, # call these in list context! ERROR1 => sub { local $_ = shift; /disk number \((\d+)\) at \((\d+:\d+)\)/; }, WARNING1 => sub { local $_ = shift; /(\w.*)^/; } }; sub parse_line { local $_ = shift; my ($key, $data) = split ':', $_, 2; $key =~ tr/()//d; ($key, $parser->{$key}->($data)); }
    parse_line() should be called in list context, too.

    After Compline,

      Your data lines are different enough that each type needs its own parser.
      Uhm, no, as shown in several other replies.
      A hash of coderefs ("dispatch table") will do that nicely,
      Actually, your solution is very inflexible. It can't even deal with:
      WARNING(2): system is rebooting
      (only the exit value is different from the original). It'll produce an error, as Perl will try to use an undefined value as a code reference.


Re: String parsing
by tachyon (Chancellor) on Jan 05, 2004 at 11:07 UTC

    Hard to do robustly with a regex

    while(<DATA>) { chomp; my ( @bits, $rest ); ( $bits[0], $rest ) = split ':', $_, 2; $bits[0] =~ tr/()//d; push @bits, ($rest =~ m!\(([^)]+)!g) if $rest; print "@bits\n"; } __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting

    Which gives you

    OFF SUCCESS abc ERROR1 27 11:03 WARNING1



      Don't you just wish that repeat counts would repeat enclosed captures? Then you could use

      m[ ( ^ [^(:]+ ) (?: (?: \( ( \d+ ) \) )? : (?: .*? \( ( [^)]+ ) \) ){1,} .* )? $ ]x;
      to grab repetative elements instead of having to do (or rather not do) things with built in limits, like this.
      #! perl -slw use strict; while( <DATA> ) { chomp; my @bits = grep{ defined } m[ ( ^ [^(:]+ ) (?: (?: \( ( \d+ ) \) )? : (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) (?: .*? \( ( [^)]+ ) \) )? )? )? )? .* )? $ ]x; print join'/', @bits; } =Output P:\test>junk OFF SUCCESS/abc ERROR/1/27/11:03 WARNING/1 TEST/255/this/that/the other/and this =cut __DATA__ OFF SUCCESS: (abc) ERROR(1): disk number (27) crashed at (11:03) WARNING(1): system is rebooting TEST(255): (this) (that) (the other) (and this) (but not this)

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

Re: String parsing
by ysth (Canon) on Jan 05, 2004 at 12:10 UTC
    my $tempstr = $str; # get rid of () around status $tempstr =~ s/^([^(:]+)\(([^)]+)\)/$1$2/; # grab the status and anything in parentheses @needed = $tempstr =~ /(^[^:]+|(?<=\()[^)]+(?=\)))/g;
    (It's really been quite a lookbehind kind of day :)

    Update: or do it the other way around:

    @needed = $_ =~ /(^[^:]+|(?<=\()[^)]+(?=\)))/g; $needed[0] =~ y/()//d;
Re: String parsing
by Hena (Friar) on Jan 05, 2004 at 10:57 UTC
    Well, wouldn't do it with one command. But this should work.
    # this is the original string $string=""; # then split on first ':' ($begin,$end)=split (/:/,$string,2); # remember every () separately # this assumes that between () there are no more (), eg not (27(b)) while ($end=~m/\((.+?)\)/g) { push (@result,$1); } # add first unshift (@result,$b); # if wanted to remove () from first # check if need to escape $result[0]=~s/[()]/;
    Update: fix typo.
      My solution is in the same realm.
      my($begin, $end) = split /:/, $string, 2; $begin =~ tr/()//d; @result = $begin; push @result, $end =~ m/\((.+?)\)/g;
Re: String parsing
by Anonymous Monk on Jan 05, 2004 at 10:54 UTC
    What have you tried so far? What exactly are you having trouble with (where is your brain block)?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://318803]
Approved by Corion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-23 14:39 GMT
Find Nodes?
    Voting Booth?

    No recent polls found