perl197 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file that I've overtly replaced certain text with a | to assist in loading the file into a table. As the last field could contain actual |'s as part of a command sequence, I want to replace any |'s after the 9th occurrence with the word pipe to load into a varchar field. It works for the 10th occurrence but somehow skips over the 11th and treats the 12th as the 11th. I.e, "|some command with the 10th | along with the 11th | along with the 12th |" results in "|some command with the 10th pipe along with the 11th | along with the 12th pipe " Is there something inherent with my approach that is unable to accurately determine the 11th occurrence of a |?...a humble novice.

open($fh, '<:encoding(UTF-8)', $outfilename) or die "Could not open fi +le '$outfilename' $!"; # open the file that had 9 |'s overtly set. open(OUT, ">$outfileload") or die "Unable to open $outfilename for wri +ting: $!\n"; # replace the 10th, 11th, and 12th |'s with pipe so the field can be +loaded into an expanded varchar field without treating the data as mo +re than one field. while ($row = <$fh>) { my $pos1 = 10; $row =~ s/(\|)/!--$pos1 ? ' pipe ' : $1/ge; my $pos2 = 11; $row =~ s/(\|)/!--$pos2 ? ' pipe ' : $1/ge; my $pos3 = 12; $row =~ s/(\|)/!--$pos3 ? ' pipe ' : $1/ge; chomp $row; print OUT "$row\n"; } print "doneagain\n";

Replies are listed 'Best First'.
Re: replace nth occurrence of |
by toolic (Bishop) on Sep 24, 2014 at 14:42 UTC
    Here is an explanation of why your code works the way it does and how to make it do what you want (with minimal changes). Your 1st substitution attempt (pos1) is successful, as you've noted. So, you've replaced the 10th | with "pipe". Now, you again want to replace the 10th | with "pipe", not the 11th:
    use warnings; use strict; while (my $row = <DATA>) { my $pos1 = 10; $row =~ s/(\|)/!--$pos1 ? ' pipe ' : $1/ge; my $pos2 = 10; $row =~ s/(\|)/!--$pos2 ? ' pipe ' : $1/ge; my $pos3 = 10; $row =~ s/(\|)/!--$pos3 ? ' pipe ' : $1/ge; chomp $row; print "$row\n"; } print "doneagain\n"; __DATA__ 1|2|3|4|5|6|7|8|9|a|b|c|d|e|f|g|h|i

    Outputs:

    1|2|3|4|5|6|7|8|9|a pipe b pipe c pipe d|e|f|g|h|i doneagain

      Thanks much as that solves my problem and gets me to where i need to be. I'll experiment with the single pass as well. Thanks again sir monks!

Re: replace nth occurrence of |
by LanX (Saint) on Sep 24, 2014 at 14:31 UTC
    I'd use split with a LIMIT and join if I were you.

    Regarding your code, I don't understand your regexes and you didn't provide sample data... (or maybe you need to format more readable)

    update

    you are applying multiple /g regexes in a row, after substituting the 10th pipe the 11th pipe becomes the 10th, so better do it in one run.

    what is !- supposed to mean???

    update

    took me a while to understand that you are negating a decremented counter !(--$pos) ...

    If you just swapped the ternary operator you wouldn't need any negation.

    Better consider checking ranges in just one single run /($count++ >9 and $count <12) ? 'pipe' : $1/

    split and join is still the readable way to do it.

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

Re: replace nth occurrence of |
by AnomalousMonk (Archbishop) on Sep 24, 2014 at 16:06 UTC

    Alternatively, with Perl versions 5.10+:

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $data = '1|2|3|4|5|6|7|8|9|a|b|c|d|e|f|g|h|i'; print qq{'$data'}; ;; local our $n; $data =~ s{ \| (?(?{ ++$n < 10 || $n > 12 }) (*FAIL)) }{ pipe }xmsg; print qq{'$data'}; " '1|2|3|4|5|6|7|8|9|a|b|c|d|e|f|g|h|i' '1|2|3|4|5|6|7|8|9|a pipe b pipe c pipe d|e|f|g|h|i'

    Update: Or, without version restrictions but with more math:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $data = '1|2|3|4|5|6|7|8|9|a|b|c|d|e|f|g|h|i'; print qq{'$data'}; ;; use constant O => 10; use constant T => 3; ;; my $p = $data =~ tr/|//; my $n = $p - O - T + 1; my $m = $n + T - 1; $data =~ s{ [|] (?= (?: [^|]* [|]){$n,$m} [^|]* \z) }{ pipe }xmsg; print qq{'$data'}; " '1|2|3|4|5|6|7|8|9|a|b|c|d|e|f|g|h|i' '1|2|3|4|5|6|7|8|9|a pipe b pipe c pipe d|e|f|g|h|i'

Re: replace nth occurrence of |
by GotToBTru (Prior) on Sep 24, 2014 at 14:45 UTC

    Using the embedded counter in the regex is too clever for its own good. And why use a delimiter that appears in the data? That is just making your job harder than it needs to be.

    1 Peter 4:10

      I'm parsing data in a log file of which the last data appended to the line consists of a unix command line entry. As commands submitted by the user may or may not include a |, (or any command line character) I wanted to replace my chosen delimiter within the command text with a benign value so as to not error out the bulk load to a table. By and large there will rarely be one | never mind 3, but wanted to cover my bets just in case.

Re: replace nth occurrence of |
by clueless newbie (Curate) on Sep 24, 2014 at 19:05 UTC

    If you started with 12 and worked down to 10

    while ($row = <$fh>) { my $pos1 = 12; $row =~ s/(\|)/!--$pos1 ? ' pipe ' : $1/ge; my $pos2 = 11; $row =~ s/(\|)/!--$pos2 ? ' pipe ' : $1/ge; my $pos3 = 10; $row =~ s/(\|)/!--$pos3 ? ' pipe ' : $1/ge; chomp $row; print OUT "$row\n"; }

    it should work.