Re: Out of memory!

Why go to the trouble of reading the tables in perl? It sounds like a problem the database is suited to. For example:

--Get list of URLs in table 1 not present in table 2
select * from table_1 where URL not in (select URL from table_2)

--Get list of URLs in both tables
select T1.URL
from table_1 T1
join table_2 T2 on T1.URL = T2.URL

--Same, but where "description" doesn't match
select T1.URL
from table_1 T1
join table_2 T2 on T1.URL = T2.URL
where T1.Description != T2.Description
[download]

Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible.

Of course, there are times when it's appropriate to use perl to whack entire tables. But in that case, you probably want to use a bulk copy utility (BCP on Sybase and MS SQL Server) to dump the tables to a flat file. It's faster and you won't run out of memory.

...roboticus

Comment on Re: Out of memory! Download Code

Replies are listed 'Best First'.
Re^2: Out of memory! by JavaFan (Canon) on Aug 12, 2010 at 11:31 UTC
Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible. That, specially the second sentence, is only true op to a point. It doesn't scale. Certainly, if you have just one process accessing the database, let the database work. But what if you have a 100 servers accessing the database, it may actually pay to do as little work as possible in the database.	[reply]
Re^3: Out of memory! by roboticus (Chancellor) on Aug 12, 2010 at 12:07 UTC
JavaFan: While I can imagine cases where it's better to do the job on a different machine in perl, I'll stand by my statement in this case. If I have 100 servers accessing the database, I certainly don't want one task sucking up all the network bandwidth by exporting and importing entire tables instead of just the rows it wants. That's why I put the weasel words "simple comparisons" in there. Index scans are normally cheap, especially on frequently-used tables where the index is mostly cached in RAM anyway. Having said that, I totally agree with your sentiment that you need to be cognizant of other database users. We definitely don't want someone to be an inconsiderate database user, frivolously consuming resources with no regard to database performance. Large-scale comparisons, for example, shouldn't be a frequent operation anyway. Proper structuring the applications can normally keep mass comparisons down to a minimum, such as for periodic maintenance. ...roboticus	[reply]
Re^2: Out of memory! by santhosh.yamsani (Initiate) on Aug 12, 2010 at 05:45 UTC
Hii all..!!<\p> Thanks for your replies <\p> Firstly i did this task in SQL only.But it took 40 to 45 minutes to compare just 1000 URLs.So we can expect how much time it would take for lakhs of urls.that is why i am trying to do this task using perl.More over i have to do this task weekly. So i have to find out a permanent solution for this.So can any one help in this.<\p>	[reply]
Re^3: Out of memory! by roboticus (Chancellor) on Aug 12, 2010 at 11:12 UTC
santhosh.yamsani: Have you tried to optimize your SQL? 40-45 minutes is quite a long time to compare 1000 URLs in any language, unless you're doing some amazingly complex comparisons. Do you know which statements are taking up your time? If you give me a few details about you table structure, indexes and SQL that's performing poorly, I can offer you some suggestions on how to make it faster. ...roboticus	[reply]