How to work with Arrays of 83,000 values

vaniaul has asked for the wisdom of the Perl Monks concerning the following question:

I have constructed a script which compares four arrays:
First array consists of publication dates of books from reading lists - approx. 2000 values in total
Second array consists of Titles of books from reading lists - approx. 2000 values in total
Third array consists of publication dates of books from catalogue - approx. 83,000 values in total
Fourth array consists of Titles of books from catalogue - approx. 83,000 values in total

The purpose of the script is to compare the values of 2000 book-titles & 2000 publication-dates with 83,000 book-titles and 83,000 publication dates. If publication-dates and book-titles match then it is printed on the screen. Ultimately, I am trying to find which books contained in reading list are contained in the catalogue as well.
---
here is my script:
----
my @readingbooks = ("a b", "c d", "e f", "g h");
my @readingdate = (1991, 1992, 1993, 1994);
my @catbooks = ("one", "two", "three", "client's", "five", "six", "seven", "eight", "a b");
my @catdate = (1980, 1981, 1982, 1994, 1984, 1985, 1986, 1987, 1991);

my $rdbooks; my $rddate; my $cataloguebooks; my $cataloguedate;

for (0..$#readingbooks){

$rdbooks = $readingbooks$_;
$rddate = $readingdate$_;

for (0..$#catbooks){

$cataloguebooks = $catbooks$_;
$cataloguedate = $catdate$_;

if (($rdbooks eq $cataloguebooks) && ($rddate == $cataloguedate)) {
print "$rdbooks \t $cataloguebooks \n";
print "$rddate \t $cataloguedate \n";
print "wow \n\n";

}
}
}

The problem in when I run a small set of data this script works fine, but when I actually run it with the complete data of 2000 to compare with 83,000 then it doesn't work. Can anyone please help what am I doing wrong here, or what process i need to adopt for making it work?

thanks in advance for your time and help!
--
Vani

Comment on How to work with Arrays of 83,000 values

Replies are listed 'Best First'.
Re: How to work with Arrays of 83,000 values by kennethk (Abbot) on Jul 23, 2013 at 17:18 UTC
So, note how your code is now misformatted and actually corrupted in a few spots? In the future, please wrap code in `<code>` tags. See How do I post a question effectively?. Reformatted, your code looks like: #!/usr/bin/perl use strict; use warnings; my @readingbooks = ("a b", "c d", "e f", "g h"); my @readingdate = (1991, 1992, 1993, 1994); my @catbooks = ("one", "two", "three", "client's", "five", "six", "sev +en", "eight", "a b"); my @catdate = (1980, 1981, 1982, 1994, 1984, 1985, 1986, 1987, 1991); my $rdbooks; my $rddate; my $cataloguebooks; my $cataloguedate; for (0..$#readingbooks){ $rdbooks = $readingbooks[$_]; $rddate = $readingdate[$_]; for (0..$#catbooks){ $cataloguebooks = $catbooks[$_]; $cataloguedate = $catdate[$_]; if (($rdbooks eq $cataloguebooks) && ($rddate == $cataloguedat +e)) { print "$rdbooks \t $cataloguebooks \n"; print "$rddate \t $cataloguedate \n"; print "wow \n\n"; } } } [download] A few critiques, since as MidLifeXis points out, you don't tell us how it doesn't work. You've scoped `$rdbooks`, `$rddate`, `$cataloguebooks`, and `$cataloguedate` outside your nested loops, despite the fact that you never use those values outside that scope. While this doesn't cause any trouble for you in this case, it's an easy way to accidentally have data leak outside scope and cause trouble. Part of the reason you have to stash the values in temporary variables is that you don't name your indices. If you swap to something that looks more like `for my $i (0..$#readingbooks){ for my $j (0..$#catbooks){ if (($readingbooks[$i] eq $catbooks[$i])...` [download] it saves typing, reduces complexity, and makes intent more obvious. Almost any time you have nested for loops around an equality test in Perl, you should really be using hashes (or possibly a real database for very large cases). It'll be dramatically faster. Modifying your posted code: #!/usr/bin/perl use strict; use warnings; my @readingbooks = ("a b", "c d", "e f", "g h"); my @readingdate = (1991, 1992, 1993, 1994); my %readingbooks_hash; $readingbooks_hash{$_}++ for @readingbooks; my %readingdate_hash; $readingdate_hash{$_}++ for @readingdate; my @catbooks = ("one", "two", "three", "client's", "five", "six", "sev +en", "eight", "a b"); my @catdate = (1980, 1981, 1982, 1994, 1984, 1985, 1986, 1987, 1991); for my $i (0 .. $#catbooks) { if ($readingbooks_hash{$catbooks[$i]} and $readingdate_hash{$catda +te[$i]}) { print "$catbooks[$i]\t$catdate[$i]\n"; print "wow \n\n"; } } [download] There are a number of additional improvements, particularly swapping from coordinated arrays to arrays of hashrefs (perllol) or even formal objects (Moose), but this should dramatically improve your situation. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re: How to work with Arrays of 83,000 values by toolic (Bishop) on Jul 23, 2013 at 17:05 UTC
Since you only showed us your working code, you'll have to debug your broken code on your own. Tip #2 from the Basic debugging checklist (print) would be a good place to start. See also: Writeup Formatting Tips	[reply]
Re: How to work with Arrays of 83,000 values by MidLifeXis (Monsignor) on Jul 23, 2013 at 16:57 UTC
Please explain what you mean by "it doesn't work". Does it emit an error, incorrect output, make your computer erupt in flames, ...? --MidLifeXis	[reply]
Re^2: How to work with Arrays of 83,000 values by vaniaul (Initiate) on Jul 23, 2013 at 17:09 UTC
it gives multiple error messages like: # Might be a runaway multi-line "" string starting on line <line no. e.g. 86055> # Bareword found where operator expected # Missing operator before <the word e.g. Looking?> # String found where operator expected A snapshot of the error messages is shown in image at: http://imgur.com/TmR0BBd&3RdEAVA#1 thanks for your help!	[reply]
Re^3: How to work with Arrays of 83,000 values by Old_Gray_Bear (Bishop) on Jul 23, 2013 at 17:28 UTC
At some point in every Perl class I have taught, I have tell someone "fix your syntax problems before you try to diagnose your logic problems." So, what happens with your code after you fix your errors? As a side note I'd suggest you rethink your design using a hash for the reading-list books and then do a single pass through the catalog file, reporting all matches as you go. The logic will be cleaner and the places where you can get the wrong index go away. (I know you think that the multiple arrays solution is 'cleaner'; Code will prove you wrong. What will happen the first time that the reading-list or the catalog changes? (Nota Bene: You have already demonstrated that problem.) This has all the potential of being a full-time maintenance headache. Not that I am objecting, mind you, cleaning up code like this on contract is what I do for a living....) ---- I Go Back to Sleep, Now. OGB	[reply]
Re^3: How to work with Arrays of 83,000 values by MidLifeXis (Monsignor) on Jul 23, 2013 at 17:14 UTC
Please include a snapshot on this site, not an off-site location. PerlMonks attempts to be self contained so that any data associated with a question is on-site, helping to ensure the long-term validity of the questions. --MidLifeXis	[reply]
Re^3: How to work with Arrays of 83,000 values by ww (Archbishop) on Jul 24, 2013 at 03:54 UTC
"...messages like...." doesn't really cut it. Here are the messages which result from adding strict and warnings to your script and running... perl -c 1045909.pl Scalar found where operator expected at 1045909.pl line 16, near "$rea +dingbooks$_" (Missing operator before $_?) # ie, Ln beginning '$rdbooks = $read +ingbooks$_;' Scalar found where operator expected at 1045909.pl line 17, near "$rea +dingdate$_" (Missing operator before $_?) Scalar found where operator expected at 1045909.pl line 21, near "$cat +books$_" (Missing operator before $_?) Scalar found where operator expected at 1045909.pl line 22, near "$cat +date$_" (Missing operator before $_?) Global symbol "$readingbooks" requires explicit package name at 104590 +9.pl line 16. syntax error at 1045909.pl line 16, near "$readingbooks$_" Global symbol "$readingdate" requires explicit package name at 1045909 +.pl line 17. syntax error at 1045909.pl line 17, near "$readingdate$_" Global symbol "$catbooks" requires explicit package name at D:\_Perl_\ +PMonks\1045909.pl line 21. syntax error at 1045909.pl line 21, near "$catbooks$_" Global symbol "$catdate" requires explicit package name at D:\_Perl_\P +Monks\1045909.pl line 22. syntax error at 1045909.pl line 22, near "$catdate$_" 1045909.pl had compilation errors. [download] And... ohmigosh -- `strict` and `warnings` + actually reading the messages -- comes to the rescue again. If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.	[reply] [d/l] [select]
Re: How to work with Arrays of 83,000 values by Laurent_R (Canon) on Jul 23, 2013 at 22:10 UTC
The problem in when I run a small set of data this script works fine, but when I actually run it with the complete data of 2000 to compare with 83,000 then it doesn't work. Then the problem is possibly in the way you are reading your data from the file and loading it into your internal data structures. It would be useful to see the code that reads the data from the files. Just in case you might be worrying on that, 83,000 entries is a quite small dataset, I am almost certain you don't have problem with the data size, unless you do something very really weird, or unless you are running your program on a Sinclair ZX81 (aka Timex/Sinclair 1000), a Commodore 64 (64 kB RAM) or an Intel 8086 platform with 380 kB memory running on Dos 3.0, dating from 1983, but I very much doubt that Perl has ever been ported on such a box). One additional point: I completely agree with Old Gray bear that hashes are fare more appropriate than arrays for the type of problems you want to solve.	[reply]