in reply to In place search and replace with a hash

Some other thoughts on the OPed code.

my ($key, $value) = split /\s/, $line;
next LINE if not $key;

split returns its input string ($line in this case) if it cannot split according to the given pattern, so  $key will almost always have a true value no matter what happens. Better to use
    my ($k, $v) = split /\s/, $line;
    next LINE if not defined $v;
I use defined to avoid false negatives from  '0' and  '' (the empty string), both of which evaluate as boolean false. I also use the variable names  $k $v because I don't like using variaable names that are the same as Perl built-in functions, etc.

$hash{$key} = $value;

The hash  %hash is presumably a package global. Better to use a pre-declared my lexical. (Better in general to avoid globals.)

chomp (%hash);

The chomp is done on all the values of the hash on every iteration through the while-loop. This does no harm, but needlessly burns cycles. chomp can be done just once on the hash after it is populated, i.e., after the while-loop.

Update: This code doesn't seem to really do anything:

@file_array =<*.txt> or die $!; foreach $file (@file_array) { open FH, "$file", or die $!; while (<FH>) { if (/(\S+):(\S+).*\n/) { s/$1/$hash{'$1'}_$2/g; } } close FH; }
To be sure, it opens a bunch of files and reads and potentially alters each line of each file... but then what? The (potentially) altered line is never written out to any file. Is this some sort of tied filehandle (see, e.g., Tie::File)? There's no indication of this if so. Also, I don't understand what you mean by "... I want to change the files in place." As I understand the phrase, this can be done only if the replacement strings are guaranteed to be exactly the same length as the sub-strings that are replaced. Again, no indication this is the case.

Further Update: The process of "editing a file" (that has not been read into a tied array) on a line-by-line basis generally goes something like this (every step is assumed to be checked for success, i.e., it produces no error code):

  1. open file to be edited (input file) in read mode ('<').
  2. Generate temporary name of output file (usually based on input file name) and open for write ('>')
  3. Until all lines are read from input file, read input file line-by-line, alter line as appropriate, write line to output file.
  4. Close input file. Close output file.
  5. Do whatever is necessary to be absolutely sure the newly-written output file is intact.
  6. Change name of input file to some backup name, e.g., 'input.txt.bak'.
  7. If the input file name change produces no errors, rename output file to original name of input file. If the output file rename is successful, the "backup" input file may be deleted; often, this file is kept in existence "just in case."
Even if it looks a lot simpler than this, this is more or less what is happening "behind the scenes." Can you present some code that resembles this process?


Give a man a fish:   <%-(-(-(-<

Replies are listed 'Best First'.
Re^2: In place search and replace with a hash
by hkates (Novice) on Dec 28, 2014 at 03:29 UTC

    Anomalous, you're right. As I wrote in my original post, I have never actually used s/foo/bar other than in a one liner. I was looking for a way to find and replace strings in a file without writing to a new output file and when the replacement strings are not the same length as the sub-strings that are replaced. Clearly I am not doing this the correct way, and if there is a correct way, I haven't been able to find/understand it yet. What I want is to open a file and if a line is:

    'nc7-52:p39|read1|1|19|296|0.0642'

    and change it to:

    C_fraterna_nc7-52

    using a hash where $hash{nc7-52}=C_fraterna_nc7-52

    Update: From this example it looks like it would be easy to print to output, but some lines in the file will not be modified and I still want to preserve them.

    sorry, this thread may be a bit out of control since I had too many problems to begin with

      From this example it looks like it would be easy to print to output ...

      As a first step, create a small test input file and just print the altered and unaltered lines to standard output, e.g.,
          print $my_new_line_what_is_just_the_way_i_want_it;
      When this limited, test output looks ok, start to think about the process of opening a new file and writing to that file instead of STDOUT. When that looks ok, start thinking about all the input/output file verification and renaming that has to happen. (I'm sure there are modules to facilitate the "edit in place" process, but I don't know what they are off-hand.)


      Give a man a fish:   <%-(-(-(-<

Re^2: In place search and replace with a hash
by hkates (Novice) on Dec 28, 2014 at 03:54 UTC
    I think I understand your editing a file instructions, and if so I was really over-complicating things trying to do an "in place" replacement. Now I am trying something like this:
    open FH, "<infile.txt" or die $!; open OUT, ">outfile.txt"; while (<FH>) { if (/(\S+):(\S+).*\n/) { $var1 = "$1"; $var2 = "$2"; print OUT "$var1_$var2\n" } elsif (/(.*)\n/) { print OUT "$1\n"; } } close FH;

    This does what I want in terms of preserving the input files non matching lines, but I am still getting a "use of uninitialized value" error, one for every line in the input file:

    Use of uninitialized value $var1_ in concatenation (.) or string at test_perl.pl line 42

      print OUT "$var1_$var2\n"

      The  '_' (underscore) character is a valid identifier, i.e., variable name, symbol. Therefore, the two variables you're using are  $var1_ (note trailing _), which is not defined, and  $var2, which is. Had you been using strictures as well as warnings, Perl would not have allowed this little oversight (see strict and warnings). The proper way to write this statement (if you really need the _) is
          print OUT "${var1}_$var2\n";
      (Note that I've added a ; at the end; the absence of this statement terminator caused no syntactic problem in the given code, but would have eventually — trust me.)

      open FH, "<infile.txt" or die $!;
      open OUT, ">outfile.txt";

      Update: You seem to be back-sliding. In contrast to your OPed code, this code uses global filehandles, two-parameter open, and does not check the status of the output file open. Tsk, tsk! IMHO, the OPed code used better practices.


      Give a man a fish:   <%-(-(-(-<

        Ah, thank you! I don't know why I was trying to use "_". Totally unnecessary. Now that you have helped me to fix most of these basic problems, I am back to the original problem of replacing the hash keys with the hash values. The hash is created with:
        use strict; use warnings; open my $fh, '<', 'hash_test.txt' or die "Cant open file $!"; LINE: while (my $line = <$fh>) { my ($key, $value) = split /\s/, $line; next LINE if not $key; $hash{$key} = $value; chomp (%hash); while ( my( $key, $value ) = each %hash ) } close $fh;

        Then the hash keys are searched for (and replaced with their associated values if found) in the input file using

        open FH, "<infile.txt" or die $!; open OUT, ">outfile.txt"; while (<FH>) { if (/(\S+):(\S+).*\n/) { $var1 = "$1"; $var2 = "$2"; print OUT "$hash{$var1} $var2\n"; } elsif (/(.*)\n/) { print OUT "$1\n"; } } close FH;

        This results in the error

        Use of uninitialized value within %hash in concatenation (.) or string at test_perl.pl line 41, <FH> line 5

        UPDATE I think the problem was that I had initialized my %hash inside a while loop. I moved it up to the beginning of the program, and it worked!

        Thanks Anomalous, I see what you mean by backsliding- I had been testing the code with a single file to control for errors with the loop (and avoid being overwhelmed by error messages). Here is the updated code that works. I see other responses suggesting better ways to do this and potential problems with the hash, and I will address those as well. But at the moment, this works. Thanks for all your help!

        #!/usr/bin/perl use strict; use warnings; my %hash = (); open my $fh, '<', 'sample_name_ID_hash.txt' or die "Cant open file $!" +; LINE: while (my $line = <$fh>) { my ($key, $value) = split /\s/, $line; next LINE if not $key; $hash{$key} = $value; chomp (%hash); } close $fh; @file_array = <*.nex> or die $!; #find all the files foreach $file (@file_array) { my $prefix = $file; $prefix =~ s/\.nex//; open FH, "$file", or die $!; print "found $file with prefix $prefix\n"; open OUT, ">out$prefix.nex" or die $!; while (<FH>) { if (/^\s+\'(\S+):(\S\d+)\|\w+\|(\d)\S+\'(.*)\n/) { $var1 = "$1"; $var2 = "$2"; $var3 = "$3"; $var4 = "$4"; print OUT "'$hash{$var1}_${var2}_$var3'$var4\n"; } elsif (/(.*)\n/) { print OUT "$1\n"; } } }
      if (/(\S+):(\S+).*\n/) { $var1 = "$1"; $var2 = "$2"; print OUT "$var1_$var2\n" } elsif (/(.*)\n/) { print OUT "$1\n"; } }

      Ok, once you get that working (i.e., fix  $var1_ problem), try reducing the big if-statement to a single  s/// substitution (untested):
          s{ (\S) : (\S) }{${1}_$2}xmsg;
      (which just replaces every : with an _), followed by a print of the (possibly altered) line (held in  $_ in this code) to the output filehandle.

      Update: This reply may have been superseded by your intervening post, but what the heck...


      Give a man a fish:   <%-(-(-(-<