Re: Out of memory!
by roboticus (Chancellor) on Aug 11, 2010 at 14:18 UTC
|
santhosh.yamsani:
Why go to the trouble of reading the tables in perl? It sounds like a problem the database is suited to. For example:
--Get list of URLs in table 1 not present in table 2
select * from table_1 where URL not in (select URL from table_2)
--Get list of URLs in both tables
select T1.URL
from table_1 T1
join table_2 T2 on T1.URL = T2.URL
--Same, but where "description" doesn't match
select T1.URL
from table_1 T1
join table_2 T2 on T1.URL = T2.URL
where T1.Description != T2.Description
Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible.
Of course, there are times when it's appropriate to use perl to whack entire tables. But in that case, you probably want to use a bulk copy utility (BCP on Sybase and MS SQL Server) to dump the tables to a flat file. It's faster and you won't run out of memory.
...roboticus
| [reply] [d/l] |
|
|
Moving a ton of data from the database server to perl to do simple comparisons is generally the wrong way to go. Learn SQL enough to do your work in the database when possible.
That, specially the second sentence, is only true op to a point. It doesn't scale. Certainly, if you have just one process accessing the database, let the database work. But what if you have a 100 servers accessing the database, it may actually pay to do as little work as possible in the database.
| [reply] |
|
|
JavaFan:
While I can imagine cases where it's better to do the job on a different machine in perl, I'll stand by my statement in this case. If I have 100 servers accessing the database, I certainly don't want one task sucking up all the network bandwidth by exporting and importing entire tables instead of just the rows it wants. That's why I put the weasel words "simple comparisons" in there. Index scans are normally cheap, especially on frequently-used tables where the index is mostly cached in RAM anyway.
Having said that, I totally agree with your sentiment that you need to be cognizant of other database users. We definitely don't want someone to be an inconsiderate database user, frivolously consuming resources with no regard to database performance. Large-scale comparisons, for example, shouldn't be a frequent operation anyway. Proper structuring the applications can normally keep mass comparisons down to a minimum, such as for periodic maintenance.
...roboticus
| [reply] |
|
|
Hii all..!!<\p>
Thanks for your replies <\p>
Firstly i did this task in SQL only.But it took 40 to 45 minutes to compare just 1000 URLs.So we can expect how much time it would take for lakhs of urls.that is why i am trying to do this task using perl.More over i have to do this task weekly. So i have to find out a permanent solution for this.So can any one help in this.<\p>
| [reply] |
|
|
santhosh.yamsani:
Have you tried to optimize your SQL? 40-45 minutes is quite a long time to compare 1000 URLs in *any* language, unless you're doing some amazingly complex comparisons. Do you know which statements are taking up your time?
If you give me a few details about you table structure, indexes and SQL that's performing poorly, I can offer you some suggestions on how to make it faster.
...roboticus
| [reply] |
Re: Out of memory!
by dasgar (Priest) on Aug 11, 2010 at 12:44 UTC
|
If I'm reading your code correctly, you're doing the following:
- Querying database 1
- Querying database 2
- Writing to file 1 using query 1
- Writing to file 2 using query 2
Also, it looks like your two queries and associated output files are completely independent of each other. In that case, I'd recommend doing the following:
- Connect and query database 1
- Write to file 1
- Use undef on the variables related to database 1
- Repeat above steps for database 2 and file 2
This will limit the amount of data in memory and free up memory once you're done with that data.
| [reply] |
|
|
Dasgar('dbi:ODBC:MSSQL','Url.txt');
Dasgar('dbi:ODBC:MSSQL','Url_Ex.txt');
sub Dasgar {
my( $dbstring, $filename );
..DBI->new($dbstring);
...open ... $filename
...
}
| [reply] [d/l] |
|
|
Oh no! I've been reduced to a subroutine! :D
Although I almost suggested a subroutine solution, I opted for a conceptual answer rather than going the code route. However, since he's doing slightly different SQL queries from the databases, I would recommend that the SQL query be another parameter to be passed into the subroutine.
Otherwise, it looks like you've managed to read my mind. Now if only I can get Perl to do that, I wouldn't have to worry about making mistakes in my code.
| [reply] |
|
|
Thanks dasgar.
Out of memory issue resolved for the first step(Copying of URLs to two different txt files).But the next step i mean comparing each url in 1st file with each URL_Exclusion in the second file and copying the matched Url along with exclusionID into the new file(matchURL.txt).
After completion of second step.We can go for third step i.e., updating Id with ExclusionID from(matchURL.txt).
OUT OF MEMORY error resolved in first step,But i am getting that error in second step
Code for the second and third steps are given below:
open (URLFHR, 'Url.txt');
open(URLEXFHR,"Url_ex.txt");
while ($ee=<URLFHR>)
{
while ($ee4=<URLEXFHR>)
{
#print "$ee - $ee4 \n";
$abc4=$ee4;
my $sthID = $dbhID->prepare("select MAX(ID) from
+DI_URL_EXCLUDE where [RULE] ='$ee4' ");
$sthID->execute();
$ID = $sthID->fetchrow_array ;
#print "$ID \n";
undef( $dbhID );
undef( $sthID );
if ($ee4 =~ /^%/)
{
$abc4=$ee4;
$abc4=~ s/^%//;##first letter
# print "$abc4 \n";
+
}
if ($ee4 =~ /%$/)
{
$abc4=$ee4;
$abc4=~ s/%$//; ##Last letter
#print "$abc4 \n";
+
}
$ee = quotemeta( $ee ); # To avoid error (U
+nmatched ) in regex; marked by <-- HERE ),To escape the special chara
+cters
$abc4 = quotemeta( $abc4 );
if( ($ee) =~ (/$abc4/) )
{
#print "In comparision of $ee and $ee4,$ID \
+n";
open (SIMILARURLFHW, '>>Similar_Url.txt');
print SIMILARURLFHW "$ee\{\|\|\}$ID \n";
close(SIMILARURLFHW);
print "\n3";
}
}
}
print "\n4";
my $a;
while($a =<sampleFile>)
{
my $UrlName = substr $a,0,index($a,'{||}');
my $EXID = substr $a, index($a,'{||}')+4;
my $sthUPEXID = $dbhUPEXID->prepare("UPDATE DI_URL SET EXCLUSI
+ONID =$EXID where URLName = '$UrlName' ");# Updating EXID in emp with
+ ID in emp4 where emp_fname matches in both
$sthUPEXID->execute();
}
| [reply] [d/l] |
|
|
I' not sure i'm understood what you want, here is my piece of code, maybe it helps. If url exclude list is too, you can just split it into smaller batches and collect "match id" results (maybe some of them multiple times)
use re 'eval';
my @url_list = qw( www.gmaisds.% www.gmai% www.go% www.gir% www.girs%
+);
my @list;
my $retval;
my $id = 0;
for my $url (@url_list) {
$url =~ s/(?=[\/\.\?])/\\/g;
$url =~ s/%/.*/g;
$url .= '(?{ use re \'eval\'; $retval = ' . $id++ . '})';
push @list, $url;
}
my $str = '^(?:' . join (')|(?:', @list) . ')$ ';
my $re = qr($str);
while (<>) {
chomp;
if ($_ =~ $re) {
print "$_ excluded; id: $retval\n";
}
}
| [reply] [d/l] |
Re: Out of memory!
by mje (Curate) on Aug 12, 2010 at 07:45 UTC
|
Ignoring the other answers you've had, some of which contain some good advice, I'm still interested as to how you get out of memory with only 100 rows in each select. What is the type of the URLName column and how long is an average field? What versions of DBI and DBD::ODBC are you using?
| [reply] |
|
|
For Suppose Urls in URLTable will be like
1) www.google.com
2) www.gmail.com
3) www.gangster.com
4) www.gorilla.com
5) www.giraffe.com
6) www.girstapache.com
Urls in URL_Exclusion Table will be like
1) www.g%
2) www.gmaisds.%
3) www.gmai%
4) www.go%
5) www.gir%
6) www.girs%
Like above i have Lakhs of URls.So I need to match exact Url for example say www.google.com matches with
www.g% and www.go%
So exact match is www.go%
In this way i have to update ID column in URL table with the ExclusionId of URL_Exclusion Table
For Example as i explained above: Id of www.google.com should be updated with the ExclusionId of www.go%
But Why its Taking long time means- -"This comparision involves LIKE' operator which compares all other URLS in URL_Exclusion table"
| [reply] |
|
|
| [reply] |
|
|
|
|
I heard perl is a powerfull tool which can handle this type of scenarios.So i am approaching this way of copying urls into file and comparing urls and updating database using perl.
But i am unable to move a step forward as i am getting OUT OF MEMORY! error
Thanks for your previous responses
| [reply] |