Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having a problem when trying to use inline replace to substitute a couple of fields in a file. An example file would be
1,AB499,Joe.Bloggs@mysite.com,MY_SERVER_ENV,sales,/opt/backup/MY_SERVE +R_ENV,sales.data,1,dbase,Apr 25 2008 3:25PM,Apr 25 2008 3:30PM,comple +ted 2,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 3,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 4,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 5,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 6,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 7,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,none,none,none 8,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,Aug 13 2010 8:30AM,Aug 13 2010 9:00AM,completed
What I'm trying to do is to replace the contents of fields 10 and 11 with those passed to the perl utility but only for the line which matches the id (the first field) whilst maintaining the rest of the record as is and the other records in the file as is.

Here is a snippet from the code .

$requestid = 0 unless $requestid; $enddate = "none" unless $enddate; $state = "none" unless $state; if ($update) { local $^I = ".bak"; # in place editing with backup fil +e @ARGV = qw(/test/info.dat); while (<>) { chomp; my @array = split /,/,$_; if ($array[0] eq $requestid) { if ($enddate) { s/$array[9]/$enddate/g + ;} if ($state) { s/$array[10]/$state/g ;} print; print "\n" ; } else { print join(",",@array); print "\n"; } } }
Now the problem. An example is if I pass the utility the enddate (Aug 13 2010 9:30AM) and state (failed) for a requestid of 3 it's doing this to the file.
1,AB499,Joe.Bloggs@mysite.com,MY_SERVER_ENV,sales,/opt/backup/MY_SERVE +R_ENV,sales.data,1,dbase,Apr 25 2008 3:25PM,Apr 25 2 008 3:30PM,completed 2,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 3,AB499,Aug 13 2010 9:30AM,Somebody.Admin@mysite.com,Aug 13 2010 9:30A +M,Aug 13 2010 9:30AM,1,dbase,Aug 13 2010 9:30AM,Aug 13 2010 9:30AM,Aug 13 2010 9:30AM 4,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 5,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 6,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 7,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,none,none,none 8,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,Aug 13 2010 8:30AM,Aug 13 2010 9:00 AM,completed
If I then run the utility again using the same params it then produces this the next time around.
1,AB499,Joe.Bloggs@mysite.com,MY_SERVER_ENV,sales,/opt/backup/MY_SERVE +R_ENV,sales.data,1,dbase,Apr 25 2008 3:25PM,Apr 25 2 008 3:30PM,completed 2,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 3,AB499,failed,Somebody.Admin@mysite.com,failed,failed,1,dbase,failed, +failed,failed 4,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 5,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 6,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 7,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,none,none,none 8,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,Aug 13 2010 8:30AM,Aug 13 2010 9:00 AM,completed
The default for most of the fields is none. The substitute command is obviously not working the way I wanted it to.I have no idea what is happening here . I'd be grateful for any help or pointers to writing better perl code !

Replies are listed 'Best First'.
Re: Problem with inline replace
by jethro (Monsignor) on Aug 13, 2010 at 13:23 UTC

    You are using the regex substitution operator the wrong way at a place where a simple assignment is sufficient. Change to this:

    if ($array[0] eq $requestid) { if ($enddate) { $array[9]= $enddate;} if ($state) { $array[10]=$state;} } print join(",",@array); print "\n";

    PS: The following does the same but is probably faster (and might be what you intented with your version):

    if ($array[0] eq $requestid) { if ($enddate) { $array[9]= $enddate;} if ($state) { $array[10]=$state;} print join(",",@array); print "\n"; } else { print; print "\n"; }
      Your observation that assignment rather than using substitution is exactly on track! Great job!

      As far as performance goes, my experience is that split is an "expensive thing" CPU wise. A regex that tests whether a line starts with X is a relatively inexpensive thing. I have also found that the join() operation is very performant - i.e. joining 12 things together is way faster than splitting one thing into 12 things. Anyway I think the performance enhancement here is to not do a split at all unless you have to - my code is shown below which uses a regex to decide whether to split or not split.

      This is great thanks. When you say the p.s. version is faster can you elaborate ?

        It might be faster because the join function is only called on the one line where you want to change something. So join is called once for the file instead of once for each line in the file

        But you might not notice anything of the speedup because your CPU is usually waiting for your hard disc when writing a file. So it does the join while waiting anyway. But at least the CPU will be cooler with the PS version then

        Note, the differences I am talking about are minimal. If you don't have to write files in the gigabytes, there won't be any noticable changes, your script would be just a bit closer to perfection ;-)

Re: Problem with inline replace
by Marshall (Canon) on Aug 15, 2010 at 04:49 UTC
    Since you asked for pointers, I recoded your routine below. One major thing is that my code contains fewer levels of indentation and a lot fewer "if" statements.

    I presume that you will make some sort of subroutine that might update multiple lines at once. Here I will just concentrate on the parts that you have shown.

    Get your "enddate" and "state" "ducks in order" before you start looping. In other words, don't bury some data validation stuff (of your sub's input data) deep in the guts of what it is doing. Do it early so that you KNOW that your new data is valid.

    Below I used the ||= operator. This tests "truthfulness" and if a variable doesn't evaluate to "true" then you get what is on the right hand side. Probably here the not so well known //= operator would be better, this tests for "definedness" instead of "truthness". So whatever you give me is ok, unless it is "undefined". Your $x = xyz unless $x is just fine also. I am just showing other ways to do that.

    In the main loop, I use a regex to decide if this is a line that I'm really interested in or not? I want to make that decision easily and quit further processing. Using a regex here: 1) is faster than splitting into all the component parts of the data record, 2) allows for comment (# lines) or 3) blank lines to just be skipped. Often files that are edited by humans or generated with "copy-n-paste" can have a trailing blank line that causes some error message.

    Once it is clear that we are one the "right" line, only "straightline" code is used, no if's or branches. We know for example that $enddate is an valid value because all that was done before we got here!

    I would steer you away from cryptic, seldom used things like local $^I = ".bak";. Explicitly open an output file and use that filehandle for printing the result.

    #!/usr/bin/perl -w use strict; my $requestid = 3; my $enddate = 'Aug 13 2010 9:30AM'; #field 9 my $state = 'failed'; #field 10 $requestid ||= 0; #or possibly use the //= operator $enddate ||= "none"; $state ||= "none"; while (<DATA>) { unless (/^$requestid,/) { print; next;} #if (!/^$requestid,/) { print; next;} #same thing if you want chomp; my @fields = split (/,/,$_); $fields[10] = $enddate; $fields[11] = $state; print join(",",@fields), "\n"; } =prints for line 3 ... others are unchanged 3,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,Aug + 13 2010 9:30AM,failed =cut __DATA__ 1,AB499,Joe.Bloggs@mysite.com,MY_SERVER_ENV,sales,/opt/backup/MY_SERVE +R_ENV,sales.data,1,dbase,Apr 25 2008 3:25PM,Apr 25 2008 3:30PM,comple +ted 2,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 3,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 4,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 5,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 6,AB499,none,Somebody.Admin@mysite.com,none,none,1,dbase,none,none,non +e 7,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,none,none,none 8,XX777,myserver,Anon.Person@mysite.com,business,/nfs/busback/upload/i +ncident,10,dbase,Aug 13 2010 8:30AM,Aug 13 2010 9:00AM,completed