Re^3: Efficient way to handle huge number of records?

Replies are listed 'Best First'.
Re^4: Efficient way to handle huge number of records? by cavac (Prior) on Dec 11, 2011 at 13:54 UTC
But I don't think the OP needs PostgreSQL or MySQL. That's true of course. As long as the project stays small. The reason i usually try to "push" people into the direction of big scaleable databases is this: Many projects have a way of growing without you noticing at first. You start small with a one-off tool, reuse it sometimes later, maybe add a new feature or two. Then a coworker comes along, sees that usefull little tool and wants it too. Since (s)he needs another small feature or two, you add that as well. That even more usefull tool spreads to more coworkers, maybe even replacing that old standard tool you've used since the dark ages.... A few years later that thing has grown into a big mess of code that works around the original design issue and you have to constantly battle against performance issues and race conditions, since the software was never meant to handle this. And you can't easely rewrite the software, because all the logic is in the old (undocumented) code... With one of the "big" databases and a clean, hierarchically data layout, you can do a lot of the basic data integrity checks (and business logic) in the database. So, writing additional, smaller tools (instead of adding to the big mess) is much easier, since the the basic system enforces data integrity, instead of having to code it up in each and every tool. Also, the big irons scale much better. Who knows, maybe the data layout stays the same, but the amount of data scales up by a factor of thousand by the end of the project? Or, just maybe, the size of each dataset stays the same but you get thousands of them, because someone realized that your tool does a nice job rerunning whateveritdoes on archived data sets or comparing changes over time? Ok, since i don't really know the original posters goals i can only speculate. But i've got a bloody nose more than one time assuming a project wouldn't grow or will never be business critical. So i'm very careful suggesting the small option when you could just as easy use one of the free "big irons" ;-) Don't use '#ff0000': use Acme::AutoColor; my $redcolor = RED(); All colors subject to change without notice.	[reply]

Replies are listed 'Best First'.

Re^4: Efficient way to handle huge number of records?
by cavac (Prior) on Dec 11, 2011 at 13:54 UTC

But I don't think the OP needs PostgreSQL or MySQL.

That's true of course. As long as the project stays small.

The reason i usually try to "push" people into the direction of big scaleable databases is this: Many projects have a way of growing without you noticing at first.

You start small with a one-off tool, reuse it sometimes later, maybe add a new feature or two. Then a coworker comes along, sees that usefull little tool and wants it too. Since (s)he needs another small feature or two, you add that as well. That even more usefull tool spreads to more coworkers, maybe even replacing that old standard tool you've used since the dark ages....

A few years later that thing has grown into a big mess of code that works around the original design issue and you have to constantly battle against performance issues and race conditions, since the software was never meant to handle this. And you can't easely rewrite the software, because all the logic is in the old (undocumented) code...

With one of the "big" databases and a clean, hierarchically data layout, you can do a lot of the basic data integrity checks (and business logic) in the database. So, writing additional, smaller tools (instead of adding to the big mess) is much easier, since the the basic system enforces data integrity, instead of having to code it up in each and every tool.

Also, the big irons scale much better. Who knows, maybe the data layout stays the same, but the amount of data scales up by a factor of thousand by the end of the project? Or, just maybe, the size of each dataset stays the same but you get thousands of them, because someone realized that your tool does a nice job rerunning whateveritdoes on archived data sets or comparing changes over time?

Ok, since i don't really know the original posters goals i can only speculate. But i've got a bloody nose more than one time assuming a project wouldn't grow or will never be business critical. So i'm very careful suggesting the small option when you could just as easy use one of the free "big irons" ;-)

Don't use '#ff0000':
use Acme::AutoColor; my $redcolor = RED();
All colors subject to change without notice.

[reply]