ostra has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

My objective is to search a database that is in a text file called qqq.txt with values on another text file called exp.txt. When running this script it only returns one match. If there is a match it should print the text that is associated with the value that is split from the database. I expected 3 matches because the database contains 3 values and the exp.txt file has the same 3 identical values. Its as if the script stops after one iteration. Any suggestions as to how to improve code to resolve this issue would be greatly appreciated. Thank you. And I do realize I did not have "or die" text in Open file statement. Ostra

open(INDB, "exp.txt"); open(DATA, "qqq.txt"); while(<INDB>) { $search = $_; chomp($search); seek(INDB, 0, 0); while(<DATA>) { $therec = $_; chomp($therec); ($ma,$id ) = split(/\t/, $therec); if($id eq $search){ print " $ma\n "; } } }

Replies are listed 'Best First'.
Re: Threading two text files
by LanX (Saint) on May 14, 2013 at 22:40 UTC
    some suggestions:

    • seek DATA you are resetting the wrong filehandle
    • rename DATA, it already has a meaning in Perl
    • write while (my $search = <INDB> ) {
    • use lexical filehandles, i.e. '$indb' instead of INDB
    • open explicitely with 3 parameters and catch errors open my $indb, "<", "exp.txt" or die "can't open exp.txt $!"
    • if speed matters consider reading both files first, if you put the data of the second into a hash you can check much faster.
    • please use proper indentation your code is hard to read, people won't help if they don't understand
    • and of course use strict and use warnings

    HTH =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      > if speed matters consider reading both files first, if you put the data of the second into a hash you can check much faster.

      like this

      use strict; use warnings; use Data::Dump qw/pp/; open my $qqq,"<","qqq.txt" or die "Open qqq failed $!"; open my $exp,"<","exp.txt" or die "Open exp failed $!"; my @exp = <$exp>; chomp @exp; my %qqq; while (<$qqq>) { my ($value,$key) = split /\s+/; push @{$qqq{$key}},$value; } #pp \%qqq,\@exp; print "$_: @{$qqq{$_}}\n" for @exp;

      out

      a: 1 11 c: 333

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      Thank you for the help. The issue was definitely not haveing <INDB> in the seek function. I saw that other PerlMongers caught this as well. I will definitely look at your version of code(as well as other Monger code and compare this with my version. But all am trying to do is search a number of values against a database and display the matches. The code I wrote for this worked well after the correction. The hash will not work for me as my database will contain more than one key value pair.Again thank you for your input. Ostra

        > The hash will not work for me as my database will contain more than one key value pair.

        Sure it does! =)

        It's a HoA (hash of arrays) holding all values per key, check the example output and uncomment pp to see the data structure.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re: Threading two text files
by ww (Archbishop) on May 14, 2013 at 23:33 UTC
    If you're going to use seek, read the documentation, the first graf of which explains how to increment the position:
     seek FILEHANDLE,POSITION,WHENCE
            Sets FILEHANDLE's position, just like the "fseek" call of
            "stdio". FILEHANDLE may be an expression whose value gives the
            name of the filehandle. The values for WHENCE are 0 to set the
            new position *in bytes* to POSITION; 1 to set it to the current
            position plus POSITION; and 2 to set it to EOF plus POSITION,
            typically negative. For WHENCE you may use the constants
            "SEEK_SET", "SEEK_CUR", and "SEEK_END" (start of the file,
            current position, end of the file) from the Fcntl module.
            Returns 1 on success, false otherwise.

    Assuming your really intend to do something like this:

    #!/usr/bin/perl use 5.016; use Data::Dumper; #1033562 (and id num_qqq.txt, idnum_exp.txt) =head file 1033562_exp.txt exp.txt 0 foo 3 bar 1 table 3 quux 3 fail 2 file 1033562_qqq.txt qqq.txt 0 fail 2 nope 1 foo 3 insert 1 bar 1 quux 3 table 3 tambourine 2 fred 14 =cut open(INDB, "1033562_exp.txt") or die "Can't open exp file, $!"; open(QQQ, "1033562_qqq.txt") or die "Can't open data file, $!"; my (@search, @therecs); while(<INDB>) { my $search = $_; chomp($search); say "\$search at Ln28: $search"; push @search, $search; seek(INDB, 0, 0); } print "\n\n"; while(<QQQ>) { my ($ma,$id); my $therec = $_; say "Both elements of \$therec at Ln36: $therec"; chomp($therec); ($ma,$id ) = split(/\t/, $therec); push @therecs, (" $ma " . "| $id |"); my $Qpos=tell QQQ; say "\n\t POS in QQQ: $Qpos \n"; } say "\n \t array search next:"; say Dumper @search; say "\n \t Array @therecs next:"; say Dumper @therecs;
    </c>

    Identifying the matches is left as an exercise to the SOPW. %hash might be an approach; so too might what you originally suggest but didn't implement-- walking the arrays in parallel. Both are well documented in threads here in the Monastery.


    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: Threading two text files
by kcott (Archbishop) on May 15, 2013 at 03:09 UTC

    G'day ostra,

    You've provided no example data. Here's my guess at what it might look like (based on your code):

    $ cat qqq.txt a c $ cat exp.txt 1 a 11 a 2 b 333 c

    Here's a solution using Tie::File:

    $ perl -Mstrict -Mwarnings -Mautodie=:all -E ' use Tie::File; tie my @exp, q{Tie::File}, q{exp.txt}; tie my @qqq, q{Tie::File}, q{qqq.txt}; for my $search (@qqq) { my @results = map { $search eq $_->[1] ? $_->[0] : () } map { [ split /\t/ ] } @exp; say "$search: @results"; } ' a: 1 11 c: 333

    Some additional notes:

    • 'And I do realize I did not have "or die" text ...' — consider using autodie.
    • If you're likely to have duplicate search strings (e.g. from a UI Search function rather than a file), Memoize might be useful to avoid duplicate searches.
    • Think about the volume of data you're dealing with and Benchmark to identify potentially good or bad solution options.

    -- Ken

Re: Threading two text files
by NetWallah (Canon) on May 14, 2013 at 22:02 UTC
    seek DATA instead of INDB.

    Update - ignore this - too many things wrong with the logic.

    Will update later if I have time, and others have not corrected.

                 "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
            -- Dr. Cox, Scrubs

Re: Threading two text files
by Laurent_R (Canon) on May 14, 2013 at 22:44 UTC

    I also think your logic is probably wrong, although I am not entirely sure of what you want to do.

    You probably want to read your reference (config) data once and load it into a hash (or some other data structure), and then read the data and match it against the hash.