Parsing spaces and curly braces

cochrasc has asked for the wisdom of the Perl Monks concerning the following question:

Enlightened ones, I'm trying to separate strings into 7 elements. The problem is if an element contains a space or is blank, it is surrounded by {}, so I can't split on spaces. There is no uniformity to which elements are enclosed by brackets.

Here are 3 lines of actual data as samples:

aotone 1 {FixServer AOT1} FixServer {-F FxAOT1} {} 1
FxACH 2 achfxsvr FixServer {-F FxACH} routex 0
ESIS 2 {ESISAdministrative Server} java -Did=esisadmin routex 0
[download]

My thought was to replace any spaces within brackets with dummy characters #%, split on spaces, discard brackets, then replace #% back with spaces. That would be a lot of work, and I'm sure there's a better way, I'm just too much of a noob to know what that is.

Any help would be appreciated.
Thanks!

2005-09-13 Retitled by Arunbear, as per Monastery guidelines
Original title: 'There's gotta be a better way...'

Comment on Parsing spaces and curly braces Download Code

Replies are listed 'Best First'.
Re: Parsing spaces and curly braces by Roy Johnson (Monsignor) on Sep 12, 2005 at 19:34 UTC
If you don't have to worry about nesting braces, you can extract the strings like so: `@matches = /(\{[^}]+\}\|\S+)/g;` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^2: Parsing spaces and curly braces by ikegami (Patriarch) on Sep 12, 2005 at 20:09 UTC
Cases not specified by the OP behave as follows: "`xxx{yyy zzz}`" is split into "`xxx{yyy`" and "`zzz}`". "`{xxx yyy}zzz`" is split into "`{xxx yyy}`" and "`zzz`". `@matches = /({[^}]+}\|[^{\s]+)/g;` might work better: "`xxx{yyy zzz}`" is split into "`xxx`" and "`{yyy zzz}`". "`{xxx yyy}zzz`" is split into "`{xxx yyy}`" and "`zzz`".	[reply] [d/l] [select]
Re^2: Parsing spaces and curly braces by cochrasc (Initiate) on Sep 12, 2005 at 19:58 UTC
That did it...thanks, Roy.	[reply]
Re: Parsing spaces and curly braces by dragonchild (Archbishop) on Sep 12, 2005 at 19:44 UTC
A little work on Text::xSV in the _get_row() and _get_quoted() subroutines plus another parameter that specifies the quoting mechanism should get you what you want. The only thing you didn't say is what happens if the curlies occur within a curlied section. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re: Parsing spaces and curly braces by QM (Parson) on Sep 12, 2005 at 19:36 UTC
You probably want a real parser. Off the top of my head, I'd go with a 2-pass approach (assuming no nested curlies): `while (my $line = <>) { chomp($line); while ( length( $line ) ) { if ( my $not_curly = /^([^{]+)/ ) { push @tokens, process_not_curly($not_curly); $line = adjust_line($line, $not_curly); } elsif ( my $curly = /^\{([^}]+)\}/ ) { push @tokens, process_curly($curly); $line = adjust_line($line, $curly); } else { die "can't get here, "; } } }` [download] and you should be able to work out process_not_curly, process_curly, and adjust_line. Update: After further review, Roy Johnson has the right idea; I need more coffee; I still like my (re)title. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^2: Parsing spaces and curly braces by ikegami (Patriarch) on Sep 12, 2005 at 20:02 UTC
Your parser is really just a tokenizer. `m/.../gc` lends itself well to writting tokenizers, as seen in the following snippet: `while (my $line = <>) { chomp($line); my @tokens; foreach ($line) { /\G ( [^{\s]+ )/xgc && do { push(@tokens, $1); redo }; /\G ( {[^}]*} )/xgc && do { push(@tokens, $1); redo }; /\G \s+/xgc; redo; } ...process split line... }` [download] Cases not specified by the OP behave as follows: "`xxx{yyy zzz}`" is split into "`xxx`" and "`{yyy zzz}`". "`{xxx yyy}zzz`" is split into "`{xxx yyy}`" and "`zzz`".	[reply] [d/l] [select]
Re: Parsing spaces and curly braces by jonadab (Parson) on Sep 12, 2005 at 19:44 UTC
You probably want a real parser. That's really only necessary if the braces can be nested. If not, a regular expression can handle it, as for example in the above-posted solution.	[reply]