Re: Filtering Output from two files

Replies are listed 'Best First'.
Re^2: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 04, 2018 at 11:55 UTC
i am actually new to scripting languages i didn't quite follow what you meant in step 2	[reply]
Re^3: Filtering Output from two files by roboticus (Chancellor) on Feb 04, 2018 at 14:25 UTC
pvighneshmufc: He meant that the following lines are inside of a loop, like this: `# read file1 into a %hash ... code to do that here ... # inside a loop while (my $line = <$file2>) { # read file2 line by line ... this was done in the loop condition above ... # split the $line to @fields at \| # if the first $fields[0] exists in the %hash, # print the whole $line to file 3 }` [download] This is a relatively common question, so LanX gave you the outline of a good solution to the problem. A frequent mistake is to try to read both files inside the loop, giving one of two bad outcomes: Either the first file is completely read in the first pass of the loop, so the code can only find a single match if it happens to be the first line in the second file, or the code re-opens the first file each time, and therefore can find all the matches, but runs extremely slowly⁽¹⁾ because it reads the first file completely for each line in the second file. (1) Extremely slowly in the relative sense--for small files you may not notice it. But if your files get large enough, you'll wonder why such a fast computer is so freakin' slow. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^4: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 04, 2018 at 15:09 UTC
The file has around 7 lakh lines so yeah :p	[reply]
Re^5: Filtering Output from two files by AnomalousMonk (Archbishop) on Feb 04, 2018 at 16:18 UTC
Re^5: Filtering Output from two files by LanX (Saint) on Feb 04, 2018 at 17:45 UTC
Re^6: Filtering Output from two files by AnomalousMonk (Archbishop) on Feb 04, 2018 at 18:35 UTC
Some notes below your chosen depth have not been shown here
Re^3: Filtering Output from two files by LanX (Saint) on Feb 04, 2018 at 12:17 UTC
Perl read file line by line led me to http://perlmaven.com/open-and-read-from-files The first code example shows it already, though $line is named $row here. I'm not going to give you complete code, because I want to help you to learn Perl. :) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Wikisyntax for the Monastery}	[reply]
Re^4: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 04, 2018 at 15:06 UTC
Thanks a lot , suggest me some tutorials to start with , i am very new to this so suggest like that :p	[reply]
Re^5: Filtering Output from two files by marto (Cardinal) on Feb 04, 2018 at 16:03 UTC
Re^3: Filtering Output from two files by hippo (Archbishop) on Feb 04, 2018 at 12:38 UTC
See Files and I/O in perlintro.	[reply]
Re^2: Filtering Output from two files by Anonymous Monk on Feb 05, 2018 at 11:50 UTC
use strict; use warnings; use Data::Dumper; my $file1 = 'file1'; my $file2 = 'file2'; #reading file1 into a hash my %hash=(); open (my $fh,'<',$file2) or die $!; while(my $line=<$fh>) { chomp $line; $hash{line}=1; print Dumper %hash; } close $fh; #reading file2 line by line open (my $fh2,'<',$file1) or die $!; while (my $row = <$fh2>) { chomp $row; my @fields = split(/\\|/, $row); print $row if exists $hash{$fields[0]}; } close $fh2; ~	[reply]
Re^3: Filtering Output from two files by LanX (Saint) on Feb 05, 2018 at 12:10 UTC
You have at least one bug of forgetting sigil $ in `line` once, but yes this was the basic idea. And dumping inside the loop is costly. Furthrmore you might want to use `<code>` tags next time. :) Minor nitpick: when setting value 1, you don't need `exists` anymore. :) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^4: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 05, 2018 at 15:00 UTC
It is me who posted anonymously since we aren't allowed to log in at the university. I didn't get where I missed "sigil $ in line once". Furthermore, i am getting errors such as Use of uninitialized value in line 34 at variables $row and $fields at the penultimate line. So when i place the `print $row if exists $hash{$fields[0]};` after the loop it gives an error "Perl requires explicit package name" i am really a novice at this if you could guide me in this I would learn more.	[reply] [d/l]
Re^5: Filtering Output from two files by LanX (Saint) on Feb 05, 2018 at 15:10 UTC
Re^4: Filtering Output from two files by Anonymous Monk on Feb 06, 2018 at 04:17 UTC
use strict; <br> use warnings; <br> use Data::Dumper; <br> my $file1 = 'file1';<br> my $file2 = 'file2'; <br> #reading file1 into a hash<br> my %hash;<br> open (my $fh,'<',$file2) or die $!;<br> while(my $line=<$fh>)<br> {<br> chomp $line;<br> $hash{$line}=1;<br> print Dumper %hash;<br> }<br> close $fh;<br> #reading file2 line by line <br> open (my ($fh2),'<',$file1) or die $!; <br> while (my ($row) = <$fh2>) {<br> chomp $row;<br> # next if $row =~ /^\s*$/;<br> my (@fields) = split(/\\|/, $row);<br> print $row if exists $hash{$fields[0]};<br> }<br> close $fh2;<br> [download] I aint getting any O/P plus when i print Dumper %hash i get loads of $VAR What am i doing wrong?	[reply] [d/l]
Re^5: Filtering Output from two files by roboticus (Chancellor) on Feb 06, 2018 at 05:09 UTC
Re^6: Filtering Output from two files by Anonymous Monk on Feb 06, 2018 at 06:04 UTC
Some notes below your chosen depth have not been shown here
Re^2: Filtering Output from two files by Anonymous Monk on Feb 06, 2018 at 11:38 UTC
#!/nairvigv/bin/perl use strict; use warnings; use Data::Dumper; my $file1 = 'BBIDs.txt'; my $file2 = 'fixedincomeTransparency.out.px.derived.updates'; #reading file1 into a hash my %hash; my @fields; open (my $fh,'<',$file1) or die $!; while(my $line=<$fh>) { chomp $line; next if $line =~ /^\s$/; $hash{$line}=1; } #print Dumper(\%hash); close $fh; open (my ($fh2),$file2) or die $!; while (my $row = <$fh2>) { chomp $row; print "$row\n"; next if $row =~ /^\s$/; my (@fields) = split(/\\|/, $row); print "$fields[0]\n "; if (exists $hash{$fields[0]}) { print "$row\n"; } } close $fh2; ~ [download] Hello this works for a program with less lines i.e 10. However when i run it on a big file say about 700k lines nothing happens. Any idea what can be causing this?	[reply] [d/l]
Re^3: Filtering Output from two files by LanX (Saint) on Feb 06, 2018 at 11:48 UTC
Due to broken indentation it's hard to read and I won't go into details. Have a look at http://perl.plover.com/FAQs/Buffering.html and basic debugging checklist Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Wikisyntax for the Monastery}	[reply]
Re^4: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 06, 2018 at 12:25 UTC
Sorry would indentation next time onwards. Basic what's happening is the big files have whitespace before and after each entry So I have to remove those while I store them to a key .	[reply]
Re^5: Filtering Output from two files (updated) by LanX (Saint) on Feb 06, 2018 at 12:39 UTC
Re^4: Filtering Output from two files by vighneshmufc (Acolyte) on Feb 06, 2018 at 15:52 UTC
`open (my ($fh2),$file2) or die $!; while (my $row = <$fh2>) { chomp $row; print "$row\n"; next if $row =~ /^\s$/; my (@fields) = split(/\\|/, $row);` [download] so i have to add `$row =~ s/\s$//` after `next if $row =~ /^\s*$/;` and it will work? i cant really check from here since i am at home and would have to wait another 12 hours	[reply] [d/l] [select]
Re^5: Filtering Output from two files by hippo (Archbishop) on Feb 06, 2018 at 16:36 UTC