Threading two text files

ostra has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Threading two text files by LanX (Saint) on May 14, 2013 at 22:40 UTC
some suggestions: seek DATA you are resetting the wrong filehandle rename DATA, it already has a meaning in Perl write `while (my $search = <INDB> ) {` use lexical filehandles, i.e. '$indb' instead of `INDB` open explicitely with 3 parameters and catch errors `open my $indb, "<", "exp.txt" or die "can't open exp.txt $!"` if speed matters consider reading both files first, if you put the data of the second into a hash you can check much faster. please use proper indentation your code is hard to read, people won't help if they don't understand and of course `use strict` and `use warnings` HTH =) Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^2: Threading two text files by LanX (Saint) on May 15, 2013 at 10:43 UTC
> if speed matters consider reading both files first, if you put the data of the second into a hash you can check much faster. like this `use strict; use warnings; use Data::Dump qw/pp/; open my $qqq,"<","qqq.txt" or die "Open qqq failed $!"; open my $exp,"<","exp.txt" or die "Open exp failed $!"; my @exp = <$exp>; chomp @exp; my %qqq; while (<$qqq>) { my ($value,$key) = split /\s+/; push @{$qqq{$key}},$value; } #pp \%qqq,\@exp; print "$_: @{$qqq{$_}}\n" for @exp;` [download] out `a: 1 11 c: 333` [download] Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l] [select]
Re^2: Threading two text files by ostra (Novice) on May 15, 2013 at 17:51 UTC
Thank you for the help. The issue was definitely not haveing <INDB> in the seek function. I saw that other PerlMongers caught this as well. I will definitely look at your version of code(as well as other Monger code and compare this with my version. But all am trying to do is search a number of values against a database and display the matches. The code I wrote for this worked well after the correction. The hash will not work for me as my database will contain more than one key value pair.Again thank you for your input. Ostra	[reply]
Re^3: Threading two text files by LanX (Saint) on May 15, 2013 at 22:02 UTC
> The hash will not work for me as my database will contain more than one key value pair. Sure it does! =) It's a HoA (hash of arrays) holding all values per key, check the example output and uncomment `pp` to see the data structure. Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l]
Re: Threading two text files by ww (Archbishop) on May 14, 2013 at 23:33 UTC
If you're going to use seek, read the documentation, the first graf of which explains how to increment the position: seek FILEHANDLE,POSITION,WHENCE Sets FILEHANDLE's position, just like the "fseek" call of "stdio". FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are 0 to set the new position in bytes to POSITION; 1 to set it to the current position plus POSITION; and 2 to set it to EOF plus POSITION, typically negative. For WHENCE you may use the constants "SEEK_SET", "SEEK_CUR", and "SEEK_END" (start of the file, current position, end of the file) from the Fcntl module. Returns 1 on success, false otherwise. Assuming your really intend to do something like this: #!/usr/bin/perl use 5.016; use Data::Dumper; #1033562 (and id num_qqq.txt, idnum_exp.txt) =head file 1033562_exp.txt exp.txt 0 foo 3 bar 1 table 3 quux 3 fail 2 file 1033562_qqq.txt qqq.txt 0 fail 2 nope 1 foo 3 insert 1 bar 1 quux 3 table 3 tambourine 2 fred 14 =cut open(INDB, "1033562_exp.txt") or die "Can't open exp file, $!"; open(QQQ, "1033562_qqq.txt") or die "Can't open data file, $!"; my (@search, @therecs); while(<INDB>) { my $search = $_; chomp($search); say "\$search at Ln28: $search"; push @search, $search; seek(INDB, 0, 0); } print "\n\n"; while(<QQQ>) { my ($ma,$id); my $therec = $_; say "Both elements of \$therec at Ln36: $therec"; chomp($therec); ($ma,$id ) = split(/\t/, $therec); push @therecs, (" $ma " . "\| $id \|"); my $Qpos=tell QQQ; say "\n\t POS in QQQ: $Qpos \n"; } say "\n \t array search next:"; say Dumper @search; say "\n \t Array @therecs next:"; say Dumper @therecs; [download] </c> Identifying the matches is left as an exercise to the SOPW. %hash might be an approach; so too might what you originally suggest but didn't implement-- walking the arrays in parallel. Both are well documented in threads here in the Monastery. If you didn't program your executable by toggling in binary, it wasn't really programming!	[reply] [d/l]
Re: Threading two text files by kcott (Archbishop) on May 15, 2013 at 03:09 UTC
G'day ostra, You've provided no example data. Here's my guess at what it might look like (based on your code): `$ cat qqq.txt a c $ cat exp.txt 1 a 11 a 2 b 333 c` [download] Here's a solution using Tie::File: `$ perl -Mstrict -Mwarnings -Mautodie=:all -E ' use Tie::File; tie my @exp, q{Tie::File}, q{exp.txt}; tie my @qqq, q{Tie::File}, q{qqq.txt}; for my $search (@qqq) { my @results = map { $search eq $_->[1] ? $_->[0] : () } map { [ split /\t/ ] } @exp; say "$search: @results"; } ' a: 1 11 c: 333` [download] Some additional notes: 'And I do realize I did not have "or die" text ...' — consider using autodie. If you're likely to have duplicate search strings (e.g. from a UI Search function rather than a file), Memoize might be useful to avoid duplicate searches. Think about the volume of data you're dealing with and Benchmark to identify potentially good or bad solution options. -- Ken	[reply] [d/l] [select]
Re: Threading two text files by NetWallah (Canon) on May 14, 2013 at 22:02 UTC
~~seek DATA instead of INDB.~~ Update - ignore this - too many things wrong with the logic. Will update later if I have time, and others have not corrected. "I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'" -- Dr. Cox, Scrubs	[reply]
Re: Threading two text files by Laurent_R (Canon) on May 14, 2013 at 22:44 UTC
I also think your logic is probably wrong, although I am not entirely sure of what you want to do. You probably want to read your reference (config) data once and load it into a hash (or some other data structure), and then read the data and match it against the hash.	[reply]