in reply to Unknow number show up

Hello Vkhaw, and welcome to the Monastery!

For each line in the output file, you are running a series of substitutions. For example, for this line:

WL0,BL10,1708

you run:

command line = 'perl -pi.bak -e s/WL0,BL1/WL0,BL1,1708/g; CombineDie1D +ie2.txt'

and later

command line = 'perl -pi.bak -e s/WL0,BL10/WL0,BL10,1708/g; CombineDie +1Die2.txt'

and you expect the second substitution to be applied to the line. But the first substitution finds a match, and so replaces it:

WL0,BL10,1708 ******* s/WL0,BL1/WL0,BL1,1708/g

(The match is marked by asterisks.) This results in:

WL0,BL1,17080,1708

and the second match (the one you want) is never applied.

One way to fix this problem would be to re-order the matches so that the longer matches occur first. Another way would be to add a look-ahead assertion to match a comma:

s/WL0,BL1(?=,)/WL0,BL1,1708/g;

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Unknown numbers show up
by Vkhaw (Novice) on Apr 12, 2015 at 15:49 UTC
    Dear Athanasius,

    I would like to thanks you for the detail explaining the logic behind why the extra number show up.

    However, I still not really know how to solve the issue. As I am dealing with 2 files with each 8 million of line. I have to use your second proposal on the assertion to match the comma.

    As this is my first write up, I am not really sure how to edit this particular code:-  $command_line = "perl -pi\.bak -e s\/".$search."\/".$replace."\/g\; CombineDie1Die2.txt";

    to what you had mentioned :-  s/WL0,BL1(?=,)/WL0,BL1,1708/g;

    I meant how to insert or where to include this (?=) specifically.

    Regards, Vkhaw

      Like this:

      my $command_line = 'perl -pi.bak -e s/' . $search . '(?=,)/' . $replac +e . '/g; CombineDie1Die2.txt'; # Add this ^^^^^

      (See “Look-Around Assertions” in perlre#Extended-Patterns.)

      By the way, there is another problem with your code: the command switch -pi.bak makes a backup of the target file each time a substitution is applied. This means that CombineDie1Die2.txt.bak ends up almost the same as CombineDie1Die2.txt (only the final substitution is omitted). This is almost certainly not what you want. It provides yet another reason to follow flexvault’s advice and remove the nested calls to Perl one-liners from within your script. Much better to re-cast the logic and use Perl’s substitution facilities, etc., directly, rather than invoking a new Perl interpreter millions of times!

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Thanks you, it help.

        Yes, I will try to further change to script to more effective. The bak file was created because I was using the liner command and without the .bak file created it show error.