Re^2: Search and Replace in one file based upon contents of another

Replies are listed 'Best First'.
Re^3: Search and Replace in one file based upon contents of another by Marshall (Canon) on Oct 26, 2016 at 18:08 UTC
Based upon your meager information, here is a framework of code (untested). #!/usr/bin/perl use strict; use warnings; #### untested #### open my $fh_transfile, '<', "filename" or die "unable to open translation file $!"; my %table; while my $line (<$fh_transfile>) { my($nat_ip, $real_ip) = split ' ', $line; $table{$nat_ip} = $real_ip; } close $fh_transfile; open my $fh_bigfile , '<', "filename" or die "unable to open big input file $!"; open my $fh_out, '>', "filenameout" or die "unable to open big output file $!" while my $line (<$fh_bigfile>) { chomp $line; my @tokens = split ',',$line; if ($table{$tokens[8]}) { $tokens[8] = $table{$tokens[8]}; } print $fh_out join(",",@tokens),"\n"; } [download] note: If the CSV file can have commas within a field (like "Smith,Sr", then you will need a module to help with the parsing. A simple split on comma will not work! I of course was not able to test with real data. I'm sure I've made some error or some detail is being missed, but this is the general idea. Update: I got a msg with a question about why is there a "chomp $line;" in the second while loop and not in the first while loop? I'll put the answer here as others may have the same question... The purpose of chomp() is to remove the line ending, represented as "\n" in Perl. If there was no chomp() in the second while loop, then the last element of @tokens would have the line ending included in that last element after the split on ','. In the first while loop, `split ' ',$line;` does NOT mean split on the "space" character. This is a special case coded into Perl and is translated into: split on any sequence of whitespace characters, (space,tab,form feed,end of line). So in the first while loop, the `split ' ',$line;` removes the line ending because it is included in the set of things to split upon. A chomp() before that split would not hurt, but it is not necessary. The difference between `split /\s+/, $line;` and `split ' ', $line;` is that in the second version, any whitespace at the front of the line is removed while in the first version, leading whitespace would cause the first element of @tokens to be a null field. Easier demo'ed than further attempts at english explanations: `use strict; use warnings; use Data::Dumper; my $line = " X Y \tZ A \n"; my @tokens = split ' ', $line; print Dumper \@tokens; @tokens = split /\s+/, $line; print Dumper \@tokens; __END__ $VAR1 = [ #split ' ' version 'X', #note ending removed 'Y', 'Z', 'A' ]; $VAR1 = [ #split /\s+/ version '', #note ending removed 'X', 'Y', 'Z', 'A' ];` [download] I also added "my" to file handle open statements.	[reply] [d/l] [select]
Re^3: Search and Replace in one file based upon contents of another by tybalt89 (Monsignor) on Oct 26, 2016 at 18:00 UTC
It's hard to give an example of the hash unless you give us examples of your two files... The better your examples match your real data, the better the example program will work.	[reply]
Re^3: Search and Replace in one file based upon contents of another by kcott (Archbishop) on Oct 27, 2016 at 13:47 UTC
G'day EntropyDF, Welcome to the Monastery. "File 1 is a CSV seperated log file. For this need the IP address is always in split 8 of the file." Do not try to write your own CSV parsing code. This is fraught with all sorts of problems and is one wheel that definitely does not need to be reinvented. Use Text::CSV for this task. "However if I could not rely on that it would be good as there are other log types this coudl apply to." Abstract your code into a subroutine and pass it the wanted field number. As you've chosen to show no code, I'm in no position to indicate how you'd modify such code to achieve this. The field number could come from any number of places, such as command line argument, config file, database, etc.; it would presumably be associated with whatever "other log types" refers to. — Ken	[reply]