in reply to Regex Parsing Chars in a Line
Having a field separator character that may appear unescaped within a field seems like a bad idea. If you can discriminate the existing, genuine field separators well enough to convert non-field separator hyphens to underscores for disambiguation, it should be possible instead to convert the true field separators to unambiguous characters as your very first step and maybe preserve a bit more of your sanity:
The "fixed" file could then be written to disk to await further processing at your leisure.c:\@Work\Perl\monks>perl -wMstrict -le "my $rec = 'A A Milne - Winnie-The-Pooh and Silver-Bear vol5-12 - Xi P +ress - Peking (1998)'; print qq{'$rec'}; ;; my $rx_old_sep = qr{ \s+ - \s+ }xms; my $new_sep = '|'; ;; $rec =~ s{ $rx_old_sep }{$new_sep}xmsg; print qq{'$rec'}; " 'A A Milne - Winnie-The-Pooh and Silver-Bear vol5-12 - Xi Press - Peki +ng (1998)' 'A A Milne|Winnie-The-Pooh and Silver-Bear vol5-12|Xi Press|Peking (19 +98)'
Or, again assuming existing field separators are sufficiently unambiguous, just split each record to an array as the very first step and do all processing on the array elements.
Give a man a fish: <%-{-{-{-<
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex Parsing Chars in a Line
by kel (Sexton) on Nov 27, 2019 at 16:49 UTC | |
by AnomalousMonk (Archbishop) on Nov 27, 2019 at 19:15 UTC |