Out of memory!

santhosh.yamsani has asked for the wisdom of the Perl Monks concerning the following question:

I have two tables in database which contain URLS(almost ONE LAKH each).

Both tables may contain same URLS or different URLS.

So i need to copy these URLS in two different files from two tables and i have to compare those URLs.

So I planned to LOAD the URLS from database to TWO different files.But i am getting "OUT OF MEMORY!" error.

So initially i tried copying only 100URLS from database to each file.

Following is the code:

use DBI;

### The database handle

my $var;

my $var_ex;

my $dbh = DBI->connect( "dbi:ODBC:MSSQL");

my $dbh4 = DBI->connect( "dbi:ODBC:MSSQL");

### The statement handle

my $sth = $dbh->prepare("select Top 100 URLName from URL ORDER BY URLID DESC");

$sth->execute();

my $sth4 = $dbh4->prepare("select Top 100 URLName from URL_EXCLUDE");

$sth4->execute();

print "\n starts";

open(URLFHW,">Url.txt");

while($var = $sth->fetchrow_array )

{

print URLFHW "$var\n" ;

}

close(URLFHW);

print "\n 1st file ";

open(URLEXFHW,">Url_ex.txt");

while($var_ex = $sth4->fetchrow_array )

{

print URLEXFHW "$var_ex\n";

}

close(URLEXFHW);

print "\n 2nd file";

OUTPUT:

D:\perl> Copy_urls.pl

starts 1st file Out of memory! 2nd file

Result:

Two files(Url.txt,Url_ex.txt) created with each 100 URLS.But perl program getting terminated at second step itself it is not executing further if i write any code after second step.I have written some script for comparing URLS after second step.but it is not executing.

Can any one help me in this..

!!!!...Thanks in ADVANCE...!!!!

Comment on Out of memory!

Replies are listed 'Best First'.
Re: Out of memory! by roboticus (Chancellor) on Aug 11, 2010 at 14:18 UTC
santhosh.yamsani: Why go to the trouble of reading the tables in perl? It sounds like a problem the database is suited to. For example: `--Get list of URLs in table 1 not present in table 2 select * from table_1 where URL not in (select URL from table_2) --Get list of URLs in both tables select T1.URL from table_1 T1 join table_2 T2 on T1.URL = T2.URL --Same, but where "description" doesn't match select T1.URL from table_1 T1 join table_2 T2 on T1.URL = T2.URL where T1.Description != T2.Description` [download] Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible. Of course, there are times when it's appropriate to use perl to whack entire tables. But in that case, you probably want to use a bulk copy utility (BCP on Sybase and MS SQL Server) to dump the tables to a flat file. It's faster and you won't run out of memory. ...roboticus	[reply] [d/l]
Re^2: Out of memory! by JavaFan (Canon) on Aug 12, 2010 at 11:31 UTC
Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible. That, specially the second sentence, is only true op to a point. It doesn't scale. Certainly, if you have just one process accessing the database, let the database work. But what if you have a 100 servers accessing the database, it may actually pay to do as little work as possible in the database.	[reply]
Re^3: Out of memory! by roboticus (Chancellor) on Aug 12, 2010 at 12:07 UTC
JavaFan: While I can imagine cases where it's better to do the job on a different machine in perl, I'll stand by my statement in this case. If I have 100 servers accessing the database, I certainly don't want one task sucking up all the network bandwidth by exporting and importing entire tables instead of just the rows it wants. That's why I put the weasel words "simple comparisons" in there. Index scans are normally cheap, especially on frequently-used tables where the index is mostly cached in RAM anyway. Having said that, I totally agree with your sentiment that you need to be cognizant of other database users. We definitely don't want someone to be an inconsiderate database user, frivolously consuming resources with no regard to database performance. Large-scale comparisons, for example, shouldn't be a frequent operation anyway. Proper structuring the applications can normally keep mass comparisons down to a minimum, such as for periodic maintenance. ...roboticus	[reply]
Re^2: Out of memory! by santhosh.yamsani (Initiate) on Aug 12, 2010 at 05:45 UTC
Hii all..!!<\p> Thanks for your replies <\p> Firstly i did this task in SQL only.But it took 40 to 45 minutes to compare just 1000 URLs.So we can expect how much time it would take for lakhs of urls.that is why i am trying to do this task using perl.More over i have to do this task weekly. So i have to find out a permanent solution for this.So can any one help in this.<\p>	[reply]
Re^3: Out of memory! by roboticus (Chancellor) on Aug 12, 2010 at 11:12 UTC
santhosh.yamsani: Have you tried to optimize your SQL? 40-45 minutes is quite a long time to compare 1000 URLs in any language, unless you're doing some amazingly complex comparisons. Do you know which statements are taking up your time? If you give me a few details about you table structure, indexes and SQL that's performing poorly, I can offer you some suggestions on how to make it faster. ...roboticus	[reply]
Re: Out of memory! by dasgar (Priest) on Aug 11, 2010 at 12:44 UTC
If I'm reading your code correctly, you're doing the following: Querying database 1 Querying database 2 Writing to file 1 using query 1 Writing to file 2 using query 2 Also, it looks like your two queries and associated output files are completely independent of each other. In that case, I'd recommend doing the following: Connect and query database 1 Write to file 1 Use undef on the variables related to database 1 Repeat above steps for database 2 and file 2 This will limit the amount of data in memory and free up memory once you're done with that data.	[reply]
Re^2: Out of memory! by Anonymous Monk on Aug 11, 2010 at 12:53 UTC
`Dasgar('dbi:ODBC:MSSQL','Url.txt'); Dasgar('dbi:ODBC:MSSQL','Url_Ex.txt'); sub Dasgar { my( $dbstring, $filename ); ..DBI->new($dbstring); ...open ... $filename ... }` [download]	[reply] [d/l]
Re^3: Out of memory! by dasgar (Priest) on Aug 11, 2010 at 15:37 UTC
Oh no! I've been reduced to a subroutine! :D Although I almost suggested a subroutine solution, I opted for a conceptual answer rather than going the code route. However, since he's doing slightly different SQL queries from the databases, I would recommend that the SQL query be another parameter to be passed into the subroutine. Otherwise, it looks like you've managed to read my mind. Now if only I can get Perl to do that, I wouldn't have to worry about making mistakes in my code.	[reply]
Re^2: Out of memory! by santhosh.yamsani (Initiate) on Aug 16, 2010 at 06:02 UTC
Thanks dasgar. Out of memory issue resolved for the first step(Copying of URLs to two different txt files).But the next step i mean comparing each url in 1st file with each URL_Exclusion in the second file and copying the matched Url along with exclusionID into the new file(matchURL.txt). After completion of second step.We can go for third step i.e., updating Id with ExclusionID from(matchURL.txt). OUT OF MEMORY error resolved in first step,But i am getting that error in second step Code for the second and third steps are given below: open (URLFHR, 'Url.txt'); open(URLEXFHR,"Url_ex.txt"); while ($ee=<URLFHR>) { while ($ee4=<URLEXFHR>) { #print "$ee - $ee4 \n"; $abc4=$ee4; my $sthID = $dbhID->prepare("select MAX(ID) from +DI_URL_EXCLUDE where [RULE] ='$ee4' "); $sthID->execute(); $ID = $sthID->fetchrow_array ; #print "$ID \n"; undef( $dbhID ); undef( $sthID ); if ($ee4 =~ /^%/) { $abc4=$ee4; $abc4=~ s/^%//;##first letter # print "$abc4 \n"; + } if ($ee4 =~ /%$/) { $abc4=$ee4; $abc4=~ s/%$//; ##Last letter #print "$abc4 \n"; + } $ee = quotemeta( $ee ); # To avoid error (U +nmatched ) in regex; marked by <-- HERE ),To escape the special chara +cters $abc4 = quotemeta( $abc4 ); if( ($ee) =~ (/$abc4/) ) { #print "In comparision of $ee and $ee4,$ID \ +n"; open (SIMILARURLFHW, '>>Similar_Url.txt'); print SIMILARURLFHW "$ee\{\\|\\|\}$ID \n"; close(SIMILARURLFHW); print "\n3"; } } } print "\n4"; my $a; while($a =<sampleFile>) { my $UrlName = substr $a,0,index($a,'{\|\|}'); my $EXID = substr $a, index($a,'{\|\|}')+4; my $sthUPEXID = $dbhUPEXID->prepare("UPDATE DI_URL SET EXCLUSI +ONID =$EXID where URLName = '$UrlName' ");# Updating EXID in emp with + ID in emp4 where emp_fname matches in both $sthUPEXID->execute(); } [download]	[reply] [d/l]
Re^3: Out of memory! by happy.barney (Friar) on Aug 16, 2010 at 07:43 UTC
I' not sure i'm understood what you want, here is my piece of code, maybe it helps. If url exclude list is too, you can just split it into smaller batches and collect "match id" results (maybe some of them multiple times) `use re 'eval'; my @url_list = qw( www.gmaisds.% www.gmai% www.go% www.gir% www.girs% +); my @list; my $retval; my $id = 0; for my $url (@url_list) { $url =~ s/(?=[\/\.\?])/\\/g; $url =~ s/%/.*/g; $url .= '(?{ use re \'eval\'; $retval = ' . $id++ . '})'; push @list, $url; } my $str = '^(?:' . join (')\|(?:', @list) . ')$ '; my $re = qr($str); while (<>) { chomp; if ($_ =~ $re) { print "$_ excluded; id: $retval\n"; } }` [download]	[reply] [d/l]
Re: Out of memory! by mje (Curate) on Aug 12, 2010 at 07:45 UTC
Ignoring the other answers you've had, some of which contain some good advice, I'm still interested as to how you get out of memory with only 100 rows in each select. What is the type of the URLName column and how long is an average field? What versions of DBI and DBD::ODBC are you using?	[reply]
Re^2: Out of memory! by santhosh.yamsani (Initiate) on Aug 13, 2010 at 05:41 UTC
For Suppose Urls in URLTable will be like 1) www.google.com 2) www.gmail.com 3) www.gangster.com 4) www.gorilla.com 5) www.giraffe.com 6) www.girstapache.com Urls in URL_Exclusion Table will be like 1) www.g% 2) www.gmaisds.% 3) www.gmai% 4) www.go% 5) www.gir% 6) www.girs% Like above i have Lakhs of URls.So I need to match exact Url for example say www.google.com matches with www.g% and www.go% So exact match is www.go% In this way i have to update ID column in URL table with the ExclusionId of URL_Exclusion Table For Example as i explained above: Id of www.google.com should be updated with the ExclusionId of www.go% But Why its Taking long time means- -"This comparision involves LIKE' operator which compares all other URLS in URL_Exclusion table"	[reply]
Re^3: Out of memory! by mje (Curate) on Aug 13, 2010 at 07:18 UTC
What are the column types in the database tables?	[reply]
Re^4: Out of memory! by santhosh.yamsani (Initiate) on Aug 13, 2010 at 09:51 UTC
Re^3: Out of memory! by santhosh.yamsani (Initiate) on Aug 13, 2010 at 05:55 UTC
I heard perl is a powerfull tool which can handle this type of scenarios.So i am approaching this way of copying urls into file and comparing urls and updating database using perl. But i am unable to move a step forward as i am getting OUT OF MEMORY! error Thanks for your previous responses	[reply]