in reply to Re^3: Suggestions to make this code more Perlish
in thread Suggestions to make this code more Perlish
The best place to read up about Perl 6 regexes is the specification itself.
You mused:
While I suspect this has something to do with '\0' terminated strings in C, I don't fully understand what's happening.
No, it's not anything to do with C string terminators.
The problem with your previous version was that you were matching an optional comma at the end of each field and then replacing it with a definite "\037" every time. So, for the last field in each record (which, of course, isn't followed by a comma), your were nevertheless appending an unwanted "\037".
The global substitution would then loop one last time, matching a final zero-character field (because of the (?<a>[^,]*) alternative, which can match nothing). The substitution on that empty field then causes a second unnecessary "\037" to be appended.
You could fix that by rewriting your original version something like this:
open my $csv_fh, '<', 'input.csv'; open my $tff_fh, '>', 'output.tff'; my $field = qr{ " (?<field> [^"]* ) " | (?<field> [^,"]* ) }x; while (my $line = <$csv_fh>) { $line =~ s{ $field (?<comma> ,?) } { $+{field} . ($+{comma} && chr 31) }gxe; $line =~ s{\n}{chr 30}xe; print {$tff_fh} $line; }
This version still matches the optional comma each time, but now only appends a "\037" if there actually was a comma. Which means there are no extras to remove, once the line is complete.
Note that I also removed the chomp and replaced it with an explicit substitution of the trailing newline. I felt that this highlights the transformation more clearly than did your clever (but subtle and "at-a-distance") use of $\.
Damian
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^5: Suggestions to make this code more Perlish
by kcott (Archbishop) on Mar 31, 2014 at 06:55 UTC | |
by TheDamian (Vicar) on Mar 31, 2014 at 07:40 UTC | |
by kcott (Archbishop) on Mar 31, 2014 at 07:56 UTC |