Dear Monks,

As some of you know, I've been working on a pure-Perl database engine to replace my organization's dependence on Oracle's Berkeley DB. At first, I was going to have the same API, but after some thought, I decided to start fresh.

Well, that's not exactly what I wanted to discuss, but rather since I needed some test cases to verify that the engine was working, I developed some to compare Berkeley DB to my DB (haven't decided on a name yet).

Since, I'm using only core Perl functions, I was asked what versions of Perl would be supported. So I found a older AIX power 3 server with AIX 5.2, single processor with 2GB of ram and SCSI drives that had perl5.6.1 and perl5.12.2 on it. I checked ( with perl -V ) that both Perls were compiled with the gcc compiler and had similar config parameters. I ran the test cases, and as I expected, the results showed a marked improvement of Perl from version 5.6.1. I ran it 3 times, and all times were consistent.

What you see below is the 3rd running of the tests, using both Perl versions. The scripts use the same algorithms and the files are in the exact same directories (cleared before each run). The only difference I know of is Perl.

Pyr:# perl5.6.1 flexbase.plx ## Start: VSZ-1980_KB RSS-3020_KB BLOCK: 2048 ( 150000 ) Write: 575 260/sec Total: 150000 Hits: 109196 Misses: 4718|2 +062 ReadNext: 36 4166/sec Total: 150000 ## End: VSZ-22212_KB RSS-23256_KB Diff:20232|20236_KB-0 BLOCK: 204 +8|4096
Pyr:# perl5.12.2 flexbase.plx ## Start: VSZ-2124_KB-0 RSS-2200_KB-0 BLOCK: 2048 ( 150000 ) Write: 503 298/sec Total: 150000 Hits: 109196 Misses: 4718|2 +062 ReadNext: 34 4411/sec Total: 150000 ## End: VSZ-20792_KB-0 RSS-20868_KB-0 Diff:18668|18668_KB-0 BLOCK: + 2048|4096

The newer Perl writes and then reads faster and uses less memory to get the job done. I use a lot of hashes in the engine, and maybe that is an area of improvement.

Perl just gets better and better!

Thank you

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re: Perl performance just gets better and better!
by vkon (Curate) on Dec 23, 2011 at 12:03 UTC
    rewriting well tested C library to pure Perl for the sake of being pure Perl is just wrong idea, waste of resources etc.

    but - even worse - discussing performance right at this moment.
    Mind to show us numbers that compare the speed of original DB against your newly created version? Same question for the memory consumption?

    I have no doubts that moving from BerkeleyDB is correct decision.
    But reinventing another DB - is very very questionable, to say the least.

    Excuse me for being straight,
    but the base idea just doesn't feels right to me.

        Excuse me for being straight, but the base idea just doesn't feels right to me.
      Because we have a business case!

      Since BerkeleyDB is used by all of our products, each year since Oracle bought Sleepycat Software, Inc., we have had an outside counsel review the dual licenses for BerkeleyDB. This year we used a new firm to review, and the outcome was not nice.

      Two points (from memory):

      • redistribution of software, even if you don't charge for it...
      • commercial use of BerkeleyDB...

      But I think you should get your own legal advice! For us those statements and others were alarming!

        Mind to show us numbers that compare the speed of original DB against your newly created version?

      When I get back to the office after the holidays, I'll put up some real numbers. But in general, our pure-perl DB performs better than expected (Note: my original goal was to get 20% of the BerkeleyDB performace and that would meet our company needs). On fixed length records that are multiples of 8 bytes, our pure-perl DB is +/- 10% of BerkeleyDB. But on random variable length key/value pairs ( 20 to 64 byte keys, and 64 to 768 byte values), we're beating BerkeleyDB by as much as 300%.

      Why, because I wrote it to Perl's strengths and not to be a copy of a C program. For example, my first pass used only arrays. When I replaced the arrays with hashes, there was an immediate performance boost. Perl's lookups are much faster than mine. After profiling, 'sort' showed up below 'pack/unpack' on the list of routines. Not worth trying to improve!

      And, thanks to some questions from BrowserUk, we can support databases as large as 1024TBytes on 32/64 bit, big/little endian machines.

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        But I think you should get your own legal advice!

        I agree.

        Regarding the license, see Re^12: Change in Berkeley DB in Perl 5.12? , basically, the FAQ says : No. The Berkeley DB license requires that software that uses Berkeley DB be freely redistributable. In the case of Perl, that software is Perl, and not your scripts.

        When SleepyCat changed berkeley-db to be dual-licensed, they kept perl users in mind. Oracle did not change the license.

        maybe you're right.

        It is hard to believe that you beat Berkeley DB by 300%.
        but - assume that you have good and honest numbers, and indeed you've outbeat the BerkeleyDB.
        It could be that you implemented a subset of DB that suits you and therefore you could win by not implementing some "difficult" places that Berkeley DB must have implemented. I do not know.

        But generally - you do not gain speed by re-implementing robust C library into pure-perl.

        Another point - external C library is safer to use with respect to fact that it is better tested - by larger user base etc.

        Ok, your mileage varies in a sence that you win in your way,
        but generally - this way just does not win.

        Regards,
        Vadim.

      I have no doubts that moving from BerkeleyDB is correct decision.

      Why?

        in a sence - if they had some problems with BerkeleyDB - whatever these are (without deepening into details), so when then they decided to move from BerkeleyDB - then - it could be argued.

        However - I have no strong opinion here. I though BerkeleyDB is a bit outdated - but maybe I am wrong - then I will stay corrected!

Re: Perl performance just gets better and better!
by Anonymous Monk on Dec 22, 2011 at 20:42 UTC
    With nine years of development between 5.6.1 and 5.12.1, I'd be disgusted if there weren't major increases in speed and efficiency.
      Actually, there are people using 5.6 because for them, it's the fastest Perl. Perl has the tendency to get slower on new releases; new features, after all, have a price. But it isn't that everything gets slower on a new release -- it's usually some things getting slower, others getting faster, and most things taking about the same time.

      For flexvault, things got better. But all we see is the (wallclock?) time of two runs of a single program, with a single dataset, on a single OS, a single machine, and a single set of compilation settings.

      Always do your own benchmarking, running programs that you'd run in production as well.

      Don't assume that because the version number has increased, the performance has as well.

Re: Perl performance just gets better and better!
by locked_user sundialsvc4 (Abbot) on Dec 29, 2011 at 15:51 UTC

    /me nods...

    To my way of thinking, there’s one thing that has blown BerkeleyDB completely out of the water, and that one thing is:   SQLite.   (Note that I am not saying that it is a drop-in replacement because very obviously it is not.)

    First of all, there are no legal encumbrances:   SQLite is public domain.   Secondly, it is a complete SQL implementation ... not merely an ISAM-file ... which nevertheless lives in a single operating-system file with no server.   And, provided that you are aware of SQLite’s behavior with regard to verifying every write that does not occur within a transaction (therefore: “always use transactions”), it is fast.

    I find that there is great business value in being able to query something, using a tool that I did not write.   I am not criticizing a venerable and rugged tool for not being more than it is, but merely pointing out that there exists a legal-free tool that is “more than it [BerkeleyDB] is,” and that I have been very satisfied with it [SQLite].   I intend this response as an aside:   I’m not offering any opinion at all about the business/legal case issues in the main thread because INAL™ and happy so to be.