in reply to Filtering Output from two files

Overview:

you'll need open , readline , chomp , split , exists , print and while loops for this.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^2: Filtering Output from two files
by vighneshmufc (Acolyte) on Feb 04, 2018 at 11:55 UTC
    i am actually new to scripting languages i didn't quite follow what you meant in step 2

      pvighneshmufc:

      He meant that the following lines are inside of a loop, like this:

      # read file1 into a %hash ... code to do that here ... # inside a loop while (my $line = <$file2>) { # read file2 line by line ... this was done in the loop condition above ... # split the $line to @fields at | # if the first $fields[0] exists in the %hash, # print the whole $line to file 3 }

      This is a relatively common question, so LanX gave you the outline of a good solution to the problem.

      A frequent mistake is to try to read *both* files inside the loop, giving one of two bad outcomes:

      • Either the first file is completely read in the first pass of the loop, so the code can only find a single match if it happens to be the first line in the second file, or
      • the code re-opens the first file each time, and therefore can find all the matches, but runs extremely slowly(1) because it reads the first file completely for each line in the second file.

      (1) Extremely slowly in the relative sense--for small files you may not notice it. But if your files get large enough, you'll wonder why such a fast computer is so freakin' slow.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        The file has around 7 lakh lines so yeah :p
        Thanks a lot , suggest me some tutorials to start with , i am very new to this so suggest like that :p
Re^2: Filtering Output from two files
by Anonymous Monk on Feb 05, 2018 at 11:50 UTC
    use strict;
    use warnings;
    use Data::Dumper;
    my $file1 = 'file1';
    my $file2 = 'file2';
    #reading file1 into a hash
    my %hash=();
    open (my $fh,'<',$file2) or die $!;
    while(my $line=<$fh>)
    {
    chomp $line;
    $hash{line}=1;
    print Dumper %hash;
    }
    close $fh;

    #reading file2 line by line
    open (my $fh2,'<',$file1) or die $!;
    while (my $row = <$fh2>) {
    chomp $row;

    my @fields = split(/\|/, $row);
    print $row if exists $hash{$fields[0]};
    }
    close $fh2; ~
      You have at least one bug of forgetting sigil $ in line once, but yes this was the basic idea.

      And dumping inside the loop is costly.

      Furthrmore you might want to use <code> tags next time. :)

      Minor nitpick: when setting value 1, you don't need exists anymore. :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Wikisyntax for the Monastery

        It is me who posted anonymously since we aren't allowed to log in at the university.
        I didn't get where I missed "sigil $ in line once".
        Furthermore, i am getting errors such as Use of uninitialized value in line 34 at variables $row and $fields at the penultimate line.
        So when i place the  print $row if exists $hash{$fields[0]}; after the loop it gives an error "Perl requires explicit package name"
        i am really a novice at this if you could guide me in this I would learn more.
        use strict; <br> use warnings; <br> use Data::Dumper; <br> my $file1 = 'file1';<br> my $file2 = 'file2'; <br> #reading file1 into a hash<br> my %hash;<br> open (my $fh,'<',$file2) or die $!;<br> while(my $line=<$fh>)<br> {<br> chomp $line;<br> $hash{$line}=1;<br> print Dumper %hash;<br> }<br> close $fh;<br> #reading file2 line by line <br> open (my ($fh2),'<',$file1) or die $!; <br> while (my ($row) = <$fh2>) {<br> chomp $row;<br> # next if $row =~ /^\s*$/;<br> my (@fields) = split(/\|/, $row);<br> print $row if exists $hash{$fields[0]};<br> }<br> close $fh2;<br>

        I aint getting any O/P plus when i print Dumper %hash i get loads of $VAR
        What am i doing wrong?
Re^2: Filtering Output from two files
by Anonymous Monk on Feb 06, 2018 at 11:38 UTC
    #!/nairvigv/bin/perl use strict; use warnings; use Data::Dumper; my $file1 = 'BBIDs.txt'; my $file2 = 'fixedincomeTransparency.out.px.derived.updates'; #reading file1 into a hash my %hash; my @fields; open (my $fh,'<',$file1) or die $!; while(my $line=<$fh>) { chomp $line; next if $line =~ /^\s*$/; $hash{$line}=1; } #print Dumper(\%hash); close $fh; open (my ($fh2),$file2) or die $!; while (my $row = <$fh2>) { chomp $row; print "$row\n"; next if $row =~ /^\s*$/; my (@fields) = split(/\|/, $row); print "$fields[0]\n "; if (exists $hash{$fields[0]}) { print "$row\n"; } } close $fh2; ~
    Hello this works for a program with less lines i.e 10. However when i run it on a big file say about 700k lines nothing happens. Any idea what can be causing this?
        Sorry would indentation next time onwards. Basic what's happening is the big files have whitespace before and after each entry So I have to remove those while I store them to a key .
        open (my ($fh2),$file2) or die $!; while (my $row = <$fh2>) { chomp $row; print "$row\n"; next if $row =~ /^\s*$/; my (@fields) = split(/\|/, $row);
        so i have to add $row =~ s/\s*$// after  next if $row =~ /^\s*$/; and it will work? i cant really check from here since i am at home and would have to wait another 12 hours