A common problem that I face in parsing complex data is needing to split the data on an unquoted value. For example, consider the following text.
this is some text. A period (".") usually terminates a statement. But not if it's quoted. Regardless of whether or not single quotes, '.', are used.
It would be nice to be able to split that into 4 individual records but just splitting on a period won't work. However, this problem is general enough that it would be nice to create a "super split" that will split data into discrete elements, but only if the data you are splitting on matches certain more complex parameters (such as being quoted, in this case).
I haven't seen a module that offers this general functionality but it's possible I missed something. Can anyone offer suggestions? Something for the specific case would be fine, but a general purpose solution would be awesome.
Update: after reading the replies, a different strategy occurs to me. Supplying an "unless" option would be helpful.
use Regexp::Common; use Data::Record; # doesn't exist my $record = Data::Record->new( split => qr/\./, unless => $RE{quoted}, ); my @data = $record->split($data);
Internally, it would be a bit inefficient in that it would have to read all of the data at once. Then, it would go through the data and find all text that matches the "unless" and "split" regexen and replace that with a unique token that does not match the split token. Then, it could just split the data. It iterates over the resulting records and replaces the tokens with the original text. I believe Filter::Simple used a similar strategy.
Cheers,
Ovid
New address of my CGI Course.
In reply to split $data, $unquoted_value; by Ovid
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |