in reply to •Re: Improved regexp sought
in thread Improved regexp sought

Point taken merlyn.

I have a file with multiple lines in. Each line consists of a variable number of variable-length fields separated by a + character. Each line is terminated by a ' character. Sometimes, a field might have a ' character in it - if so, the ' is preceeded by a question-mark. Here are some example lines:

0010+2+O'Reilly' 023++++234+35+White+++17+' g?'day mate+++'
I want to break each line up into its constituent fields. I can do it with brute force, but would prefer elegance.

Thanks
Myomancer

Replies are listed 'Best First'.
Re^3: Improved regexp sought
by duff (Parson) on Oct 27, 2004 at 14:39 UTC

    Mayhap you want to take a multi-step approach.

    $string =~ s/'$//; $string =~ s/\?'/'/g; @fields = split /\+/, $string;
    I want to break each line up into its constituent fields. I can do it with brute force, but would prefer elegance.

    I usually choose "working" over "not working" :-)

Re^3: Improved regexp sought
by diotalevi (Canon) on Oct 27, 2004 at 14:54 UTC

    use Text::CSV_XS. I guessed that your lines were terminated with apostrophe + newline. Alter the code to fit.

    use Text::CSV_XS; my $parser = Text::CSV_XS->new( { eol => "'\n", escape_char => "'", sep_char => "+" } ); while ( my $line = <$fh> ) { $parser->parse( $line ); print join( ", ", $parser->fields ) . "\n"; }
Re^3: Improved regexp sought
by Limbic~Region (Chancellor) on Oct 27, 2004 at 14:43 UTC
    myomancer,
    I can do it with brute force, but would prefer elegance

    I hope you aren't confusing conciseness with elegance. There are not always related. See the following:

    my $str = "0010+2+O?'Reilly'"; my @field = map {s/\?'/'/g; $_ } split /\+/ , substr($str,0, (length $ +str) - 1); print "[$_]$field[$_]\n" for 0 .. $#field;
    IMO, the code would be more elegant broken out into multiple lines - perhaps with comments.

    Cheers - L~R