balt has asked for the wisdom of the Perl Monks concerning the following question:

Oh enlightened ones, I have this string of a thing that needs parsed. Parts of it vary in length, but there are some anchors to go by. This example code sums it all up. Can anyone suggest a more elegant (=regexish and perlish) way to parse this? While my method works, it's not pretty. This is all about parsing backwards and forwards from a known middle anchor. Valid assumptions are: - only a single space between fields - the second field can have several space separated words - the fourth field comprises everything left to the right. :-)
#!/usr/bin/perl -w use strict; my $string = "firstfield second field KNOWNWORD alwaysOneWord then one + or more make the last field"; my $field1 = ( split / /, $string)[0]; my $field2 = substr($string, length("$field1 "), index($string, "KNOWN +WORD") - length("$field1 ")); my @temp = split / /, substr($string, index($string, "KNOWNWORD")+len +gth("KNOWNWORD")); my $field3 = $temp[1]; my $field4 = join ' ', @temp[2..$#temp]; print "field1: $field1\n"; print "field2: $field2\n"; print "field3: $field3\n"; print "field4: $field4\n";
This prints the expected result:
field1: firstfield field2: second field field3: alwaysOneWord field4: then one or more make the last field

Replies are listed 'Best First'.
Re: Help perlifying this string parse-o-rama
by hdb (Monsignor) on Oct 22, 2015 at 12:48 UTC

    split has a third parameter to provide the maximum number of parts to split in. This could be used like this:

    my @fields = map { split / /, $_, 2 } split /\s+KNOWNWORD\s+/, $string +;
Re: Help perlifying this string parse-o-rama
by choroba (Cardinal) on Oct 22, 2015 at 12:27 UTC
    A regexish solution:
    #!/usr/bin/perl use warnings; use strict; my $string = 'firstfield second field KNOWNWORD alwaysOneWord then one + or more make the last field'; my (@fields) = $string =~ / ^ (\S+) [ ] (.*?) [ ] KNOWNWORD [ ] (\S+) +[ ] (.*) /x; for my $i (0 .. $#fields) { print "$i:$fields[$i]\n"; }

    Note that it might not work if KNOWNWORD is one of the words in the second field.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Help perlifying this string parse-o-rama
by BrowserUk (Patriarch) on Oct 22, 2015 at 12:26 UTC

    Relatively straight forward:

    $s = "firstfield second field KNOWNWORD alwaysOneWord then one or more + make the last field";; @fields = $s =~ m[(\S+)\s(.+)\sKNOWNWORD\s(\S+)\s(.+)];; print join '|', @fields;; firstfield|second field|alwaysOneWord|then one or more make the last f +ield

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.