biscardi has asked for the wisdom of the Perl Monks concerning the following question:

This is my problem: I have a tab delimited file (file_A.txt) like this
a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4
starting form the top of the file I need 1. to get the the value in the 3rd column (c1)

2. search in a second file (file_B.txt, not tab delimited and quite messy) all the matches for it.

3. when a match is found, I would like to append to the current value (c1), the value of 4th column (d1) in file_A.txt, separated by a space.

4. go back to the first file (file_A.txt), get the the value in the 3rd column in the second row (c2) and do another round of search and insert the value of d2 in the second file (file_B.txt).

5. go ahead with the search and replace until the end of file_A.txt is reached. I am an absolute newbee and I put together this code that does not work very well.

#!/usr/bin/perl open (datafile, "file_A.txt"); @fileinput = split("\t", <datafile>); for ($i = 2; $i <=200;){ open(OF, "file_B.txt"); #file_B.txt contains the original file open(NF, ">file_B_out.txt"); #file_B_out.txt contains the processe +d output while ($line = <OF>) { print "$fileinput[$i]\n"; print "$i\n"; $line =~ s/$fileinput[$i]/$fileinput[$i+1]/g; #print $line; print NF $line; } $i=$i+4; } close(NF); close(OF);
1. I don`t know how to tell perl to loop until the end of file_A.txt, so I have just used a for statement

2. the search and replace routine does not work and I don`t understand why

Any suggestions and possibly example would be really appreciated

thanks.

Replies are listed 'Best First'.
Re: Get data from a file to search and replace within a second file
by almut (Canon) on Mar 23, 2010 at 00:19 UTC

    I took the freedom to slightly simplify your spec, using a lookup table. The output should be what I suppose you want, i.e. replace all occurrences of c1 etc. with c1 d1 etc.:

    #!/usr/bin/perl use strict; use warnings; my $fname_A = "file_A.txt"; my $fname_B = "file_B.txt"; my $fname_B_out = "file_B_out.txt"; my %subst; # lookup table open (my $data_fh, "<", $fname_A) or die "Couldn't open '$fname_A': $! +"; while (<$data_fh>) { chomp; my ($find, $add) = (split /\t/)[2,3]; $subst{$find} = $add; # print "$find => $add\n"; # debug } my $search = join "|", map quotemeta, keys %subst; open (my $in_fh, "<", $fname_B) or die "Couldn't open '$fname_B': + $!"; open (my $out_fh, ">", $fname_B_out) or die "Couldn't open '$fname_B_o +ut' for writing: $!"; while (my $line = <$in_fh>) { $line =~ s/($search)/$1 $subst{$1}/g; print $out_fh $line; }

    The output might not be what you want in case the search strings aren't unique (because of the lookup hash), or if the substitutions aren't independent, such as when c1 => c2 in the first run through the file, c2 => d2 in the second run, etc. (because of doing it all in one go).

Re: Get data from a file to search and replace within a second file
by GrandFather (Saint) on Mar 22, 2010 at 23:59 UTC

    First a few suggestions:

    • Always use strictures (use strict; use warnings;)
    • use the three parameter version of open and always check the result.
    • use lexical file handles (declared with my)
    • use the Perl version of the for loop, not the C version
    • avoid opening and closing the same file multiple times
    #!/usr/bin/perl use strict; use warnings; # Fake up a couple of files my $file_a = <<TXT; a1\tb1\tc1\td1 a2\tb2\tc2\td2 a3\tb3\tc3\td3 TXT my $file_b = <<TXT; starting form the top of the file I need 1. to get the the value in th +e 3rd column (c1) 2. search in a second file (file_B.txt, not tab delimited and quite me +ssy) all the matches for it. 3. when a match is found, I would like to append to the current value +(c1), the value of 4th column (d1) in file_A.txt, separated by a space. 4. go back to the first file (file_A.txt), get the the value in the 3r +d column in the second row (c2) and do another round of search and insert the v +alue of d2 in the second file (file_B.txt). TXT # Now the 'real' work - \$file_b treats $file_b as a file open my $inB, '<', \$file_b or die "Can't open file_b: $!"; my $fileBStr = do {local $/; <$inB>}; # Slurp in all of file_b close $inB; open my $inA, '<', \$file_a or die "Can't open file_a: $!"; while (<$inA>) { chomp; my @parts = split /\t/; next if @parts < 4; $fileBStr =~ s/\b $parts[2] \b/$parts[2] $parts[3]/xgm; } close $inA; print $fileBStr;

    Prints:

    starting form the top of the file I need 1. to get the the value in th +e 3rd column (c1 d1) 2. search in a second file (file_B.txt, not tab delimited and quite me +ssy) all the matches for it. 3. when a match is found, I would like to append to the current value +(c1 d1), the value of 4th column (d1) in file_A.txt, separated by a space. 4. go back to the first file (file_A.txt), get the the value in the 3r +d column in the second row (c2 d2) and do another round of search and insert th +e value of d2 in the second file (file_B.txt).

    Reading the file you are editing into memory is fine unless its size is hundreds of megabytes. For very large files you probably need to turn the loop inside out - read all the edit information from file a and store that in memory, then read file b a line at a time and apply all the edits to the current line before saving it and moving on to the next.


    True laziness is hard work
Re: Get data from a file to search and replace within a second file
by toolic (Bishop) on Mar 23, 2010 at 00:23 UTC
    One approach is to read the 3rd and 4th columns of file A into a hash, then for each line of file B, loop through the hash keys, making the substitutions.
    use strict; use warnings; my $fhi; my %data; open $fhi, '<', 'file_A.txt' or die "can not open file file_A.txt: $!" +; while (<$fhi>) { chomp; my @cols = split /\t/; $data{$cols[2]} = "@cols[2..3]"; } close $fhi; open $fhi, '<', 'file_B.txt' or die "can not open file file_B.t +xt: $!"; open my $fho, '>', 'file_B_out.txt' or die "can not open file file_B_o +ut.txt: $!"; while (<$fhi>) { for my $k (keys %data) { s/$k/$data{$k}/g; } print $fho $_; } close $fho;
    One functional flaw with your solution is that you keep overwriting your output file every time you open it for output. Thus, you lose the results of your previous substitution.

    Update: I like almut's $search string better than my for loop.

      Dear all,

      thanks for your suggestions. I will need little bit to "digest" your suggestions.

      1. GrandFather: I have hard time telling apart your suggested code form some of the comments. I will try to disentangle the thing and get back to you.

      2. I tried the code from almut. It works but it will not distinguish between c1 and c11. In other words when C11 is found d1 is added within the 11. The final result is "c1 d11"

      3. I will take a look at toolic suggestion later

      Thanks for your help.

        2. I tried the code from almut. It works but it will not distinguish between c1 and c11. In other words when C11 is found d1 is added within the 11. The final result is "c1 d11"
        My solution also does not distinguish between c1 and c11. You can add \b anchors, as GrandFather has (see perlre):
        s/\b$k\b/$data{$k}/g;