Get data from a file to search and replace within a second file

biscardi has asked for the wisdom of the Perl Monks concerning the following question:

This is my problem: I have a tab delimited file (file_A.txt) like this

a1    b1    c1    d1
a2    b2    c2    d2
a3    b3    c3    d3
a4    b4    c4    d4
[download]

starting form the top of the file I need 1. to get the the value in the 3rd column (c1)

2. search in a second file (file_B.txt, not tab delimited and quite messy) all the matches for it.

3. when a match is found, I would like to append to the current value (c1), the value of 4th column (d1) in file_A.txt, separated by a space.

4. go back to the first file (file_A.txt), get the the value in the 3rd column in the second row (c2) and do another round of search and insert the value of d2 in the second file (file_B.txt).

5. go ahead with the search and replace until the end of file_A.txt is reached. I am an absolute newbee and I put together this code that does not work very well.

#!/usr/bin/perl
open (datafile, "file_A.txt");
@fileinput = split("\t", <datafile>);
    
for ($i = 2; $i <=200;){
    open(OF, "file_B.txt"); #file_B.txt contains the original file 
    open(NF, ">file_B_out.txt"); #file_B_out.txt contains the processe
+d output
    while ($line = <OF>) {
    print "$fileinput[$i]\n";
    print "$i\n";
    $line =~ s/$fileinput[$i]/$fileinput[$i+1]/g;
    #print $line;
    print NF $line;
    
    }
$i=$i+4;
}
close(NF);
close(OF);
[download]

1. I don`t know how to tell perl to loop until the end of file_A.txt, so I have just used a for statement

2. the search and replace routine does not work and I don`t understand why

Any suggestions and possibly example would be really appreciated

thanks.

Comment on Get data from a file to search and replace within a second file Select or Download Code

Replies are listed 'Best First'.
Re: Get data from a file to search and replace within a second file by almut (Canon) on Mar 23, 2010 at 00:19 UTC
I took the freedom to slightly simplify your spec, using a lookup table. The output should be what I suppose you want, i.e. replace all occurrences of `c1` etc. with `c1 d1` etc.: #!/usr/bin/perl use strict; use warnings; my $fname_A = "file_A.txt"; my $fname_B = "file_B.txt"; my $fname_B_out = "file_B_out.txt"; my %subst; # lookup table open (my $data_fh, "<", $fname_A) or die "Couldn't open '$fname_A': $! +"; while (<$data_fh>) { chomp; my ($find, $add) = (split /\t/)[2,3]; $subst{$find} = $add; # print "$find => $add\n"; # debug } my $search = join "\|", map quotemeta, keys %subst; open (my $in_fh, "<", $fname_B) or die "Couldn't open '$fname_B': + $!"; open (my $out_fh, ">", $fname_B_out) or die "Couldn't open '$fname_B_o +ut' for writing: $!"; while (my $line = <$in_fh>) { $line =~ s/($search)/$1 $subst{$1}/g; print $out_fh $line; } [download] The output might not be what you want in case the search strings aren't unique (because of the lookup hash), or if the substitutions aren't independent, such as when c1 => c2 in the first run through the file, c2 => d2 in the second run, etc. (because of doing it all in one go).	[reply] [d/l] [select]
Re: Get data from a file to search and replace within a second file by GrandFather (Saint) on Mar 22, 2010 at 23:59 UTC
First a few suggestions: Always use strictures (use strict; use warnings;) use the three parameter version of open and always check the result. use lexical file handles (declared with my) use the Perl version of the for loop, not the C version avoid opening and closing the same file multiple times #!/usr/bin/perl use strict; use warnings; # Fake up a couple of files my $file_a = <<TXT; a1\tb1\tc1\td1 a2\tb2\tc2\td2 a3\tb3\tc3\td3 TXT my $file_b = <<TXT; starting form the top of the file I need 1. to get the the value in th +e 3rd column (c1) 2. search in a second file (file_B.txt, not tab delimited and quite me +ssy) all the matches for it. 3. when a match is found, I would like to append to the current value +(c1), the value of 4th column (d1) in file_A.txt, separated by a space. 4. go back to the first file (file_A.txt), get the the value in the 3r +d column in the second row (c2) and do another round of search and insert the v +alue of d2 in the second file (file_B.txt). TXT # Now the 'real' work - \$file_b treats $file_b as a file open my $inB, '<', \$file_b or die "Can't open file_b: $!"; my $fileBStr = do {local $/; <$inB>}; # Slurp in all of file_b close $inB; open my $inA, '<', \$file_a or die "Can't open file_a: $!"; while (<$inA>) { chomp; my @parts = split /\t/; next if @parts < 4; $fileBStr =~ s/\b $parts[2] \b/$parts[2] $parts[3]/xgm; } close $inA; print $fileBStr; [download] Prints: starting form the top of the file I need 1. to get the the value in th +e 3rd column (c1 d1) 2. search in a second file (file_B.txt, not tab delimited and quite me +ssy) all the matches for it. 3. when a match is found, I would like to append to the current value +(c1 d1), the value of 4th column (d1) in file_A.txt, separated by a space. 4. go back to the first file (file_A.txt), get the the value in the 3r +d column in the second row (c2 d2) and do another round of search and insert th +e value of d2 in the second file (file_B.txt). [download] Reading the file you are editing into memory is fine unless its size is hundreds of megabytes. For very large files you probably need to turn the loop inside out - read all the edit information from file a and store that in memory, then read file b a line at a time and apply all the edits to the current line before saving it and moving on to the next. True laziness is hard work	[reply] [d/l] [select]
Re: Get data from a file to search and replace within a second file by toolic (Bishop) on Mar 23, 2010 at 00:23 UTC
One approach is to read the 3rd and 4th columns of file A into a hash, then for each line of file B, loop through the hash keys, making the substitutions. `use strict; use warnings; my $fhi; my %data; open $fhi, '<', 'file_A.txt' or die "can not open file file_A.txt: $!" +; while (<$fhi>) { chomp; my @cols = split /\t/; $data{$cols[2]} = "@cols[2..3]"; } close $fhi; open $fhi, '<', 'file_B.txt' or die "can not open file file_B.t +xt: $!"; open my $fho, '>', 'file_B_out.txt' or die "can not open file file_B_o +ut.txt: $!"; while (<$fhi>) { for my $k (keys %data) { s/$k/$data{$k}/g; } print $fho $_; } close $fho;` [download] One functional flaw with your solution is that you keep overwriting your output file every time you open it for output. Thus, you lose the results of your previous substitution. Update: I like almut's `$search` string better than my for loop.	[reply] [d/l] [select]
Re^2: Get data from a file to search and replace within a second file by biscardi (Initiate) on Mar 23, 2010 at 15:48 UTC
Dear all, thanks for your suggestions. I will need little bit to "digest" your suggestions. 1. GrandFather: I have hard time telling apart your suggested code form some of the comments. I will try to disentangle the thing and get back to you. 2. I tried the code from almut. It works but it will not distinguish between c1 and c11. In other words when C11 is found d1 is added within the 11. The final result is "c1 d11" 3. I will take a look at toolic suggestion later Thanks for your help.	[reply]
Re^3: Get data from a file to search and replace within a second file by toolic (Bishop) on Mar 23, 2010 at 16:41 UTC
2. I tried the code from almut. It works but it will not distinguish between c1 and c11. In other words when C11 is found d1 is added within the 11. The final result is "c1 d11" My solution also does not distinguish between c1 and c11. You can add `\b` anchors, as GrandFather has (see perlre): `s/\b$k\b/$data{$k}/g;` [download]	[reply] [d/l] [select]
Re^4: Get data from a file to search and replace within a second file by educated_foo (Vicar) on Mar 24, 2010 at 14:21 UTC