cochrasc has asked for the wisdom of the Perl Monks concerning the following question:

Enlightened ones, I'm trying to separate strings into 7 elements. The problem is if an element contains a space or is blank, it is surrounded by {}, so I can't split on spaces. There is no uniformity to which elements are enclosed by brackets.

Here are 3 lines of actual data as samples:
aotone 1 {FixServer AOT1} FixServer {-F FxAOT1} {} 1 FxACH 2 achfxsvr FixServer {-F FxACH} routex 0 ESIS 2 {ESISAdministrative Server} java -Did=esisadmin routex 0
My thought was to replace any spaces within brackets with dummy characters #%, split on spaces, discard brackets, then replace #% back with spaces. That would be a lot of work, and I'm sure there's a better way, I'm just too much of a noob to know what that is.

Any help would be appreciated.
Thanks!

2005-09-13 Retitled by Arunbear, as per Monastery guidelines
Original title: 'There's gotta be a better way...'

Replies are listed 'Best First'.
Re: Parsing spaces and curly braces
by Roy Johnson (Monsignor) on Sep 12, 2005 at 19:34 UTC
    If you don't have to worry about nesting braces, you can extract the strings like so:
    @matches = /(\{[^}]+\}|\S+)/g;

    Caution: Contents may have been coded under pressure.

      Cases not specified by the OP behave as follows:

      "xxx{yyy zzz}" is split into "xxx{yyy" and "zzz}".
      "{xxx yyy}zzz" is split into "{xxx yyy}" and "zzz".

      @matches = /({[^}]+}|[^{\s]+)/g; might work better:

      "xxx{yyy zzz}" is split into "xxx" and "{yyy zzz}".
      "{xxx yyy}zzz" is split into "{xxx yyy}" and "zzz".

      That did it...thanks, Roy.
Re: Parsing spaces and curly braces
by dragonchild (Archbishop) on Sep 12, 2005 at 19:44 UTC
    A little work on Text::xSV in the _get_row() and _get_quoted() subroutines plus another parameter that specifies the quoting mechanism should get you what you want.

    The only thing you didn't say is what happens if the curlies occur within a curlied section.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Parsing spaces and curly braces
by QM (Parson) on Sep 12, 2005 at 19:36 UTC
    You probably want a real parser. Off the top of my head, I'd go with a 2-pass approach (assuming no nested curlies):
    while (my $line = <>) { chomp($line); while ( length( $line ) ) { if ( my $not_curly = /^([^{]+)/ ) { push @tokens, process_not_curly($not_curly); $line = adjust_line($line, $not_curly); } elsif ( my $curly = /^\{([^}]+)\}/ ) { push @tokens, process_curly($curly); $line = adjust_line($line, $curly); } else { die "can't get here, "; } } }
    and you should be able to work out process_not_curly, process_curly, and adjust_line.

    Update: After further review, Roy Johnson has the right idea; I need more coffee; I still like my (re)title.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Your parser is really just a tokenizer. m/.../gc lends itself well to writting tokenizers, as seen in the following snippet:

      while (my $line = <>) { chomp($line); my @tokens; foreach ($line) { /\G ( [^{\s]+ )/xgc && do { push(@tokens, $1); redo }; /\G ( {[^}]*} )/xgc && do { push(@tokens, $1); redo }; /\G \s+/xgc; redo; } ...process split line... }

      Cases not specified by the OP behave as follows:

      "xxx{yyy zzz}" is split into "xxx" and "{yyy zzz}".
      "{xxx yyy}zzz" is split into "{xxx yyy}" and "zzz".

      You probably want a real parser.

      That's really only necessary if the braces can be nested. If not, a regular expression can handle it, as for example in the above-posted solution.