Re^2: Moving from SQL to flat-files
by jZed (Prior) on May 09, 2006 at 17:21 UTC
|
I've long wanted to have the time to make a DBD::DbmDeep. It ain't gonna happen. If anyone wants to do this, it would provide DBI/SQL access to DBM::Deep's excellent backend. As mentioned elsewhere in this thread, DBD::DBM would be a good place to start. In fact, it's possible it already works with DBM::Deep (yeah, yeah, it's my code but I've forgotten what it does). If you (dragonchild) or anyone else wants to pursue this, I'd be glad to help. | [reply] |
|
|
I'm actually going in the other direction and writing Presto, which is a OODBMS built on top of DBM::Deep.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
| [reply] |
|
|
Very cool. Still it's not mutually exclusive - why not have OODBMS *and* RDBMS frontends to DBM::Deep?
| [reply] |
|
|
Re^2: Moving from SQL to flat-files
by punkish (Priest) on May 09, 2006 at 16:21 UTC
|
I did look at DBM::Deep. It is a very impressive and simple piece of work. If I were to implement a DBM-based datastore, I would definitely choose DBM::Deep over DB_File and its ilk.
There is something seductive about flat-files, however. I can open them in a text editor and edit them without requiring a Perl or web interfacec to get to them.
To that end, my question devolves into -- how can I maintain the metadata also in one file? Right now I have the MainFile and its shadow .MainFile. If I want to change something about the MainFile, I have to open one or the other or both. It would be nice to do it all in one place. I tinkered with something like (contents of a supersized MainFile below) --
MainFile
This is the MainFile. Blah blah blah.
__METADATA__
key:value
key:value
key:value
But I couldn't figure out a efficient and elegant way of reading that in and separating the data from the metadata, returning the metadata in a hash.
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] [select] |
|
|
Before you come up with yet another metadata syntax, you might want to look at LDIF, which your syntax looks similar to.
But as for reading the data / metadata, it's not too hard:
my @data = ();
while ( my $line = <IN>) {
last if ($line =~ /^__METADATA__$/);
push @data, $line;
}
# assuming no multivalue keys
my %metadata = map { chomp $_; split( $_, /:/, 2) } (<IN>);
I'd personally put the metadata before the data -- as there's no chance in the marker being in the metadata section, but there is in the data. I don't know what your usage patterns are, so it might be an additional overhead to be skipping the metadata whenever it's not needed, though:
while (my $line = <IN>) {
last if ($line =~ /^__DATA__$/);
}
my @data = <IN>;
I'd also avoid hashing on titles / names if the data is going to grow significantly, and the title isn't the only indexor. The english language just doesn't have good distribution. | [reply] [d/l] [select] |
|
|
Thanks. I checked out LDIF, and while it is similar to what I am implementing, what else could it be? ;-). Mine is a very simple key:value scheme, and since it replicates one denormalized table, there are no multivalue keys (as of now!).
The reason I want to keep metadata at the end of the file is that if I do open the file up in a text editor, the metadata is at the end. The first thing I see is the narrative page content, and I can edit that easily. If that is the only thing I am changing, then I don't even have to scroll down to the metadata section. Other relevant metadata value, namely, the mtime, will get changed by the operating system.
One disadvantage of keeping the metadata in the same file is that I can't tie the metadata to a hash, or can't change just the metadata without changing the text as well. So, if I have to change any of the metadata, I have to rewrite the entire file even if I am not changing any of the file text. An alternative would be to store the metadata in a DBM-type hash... but that is more than the "simplest thing that could work."
There is the issue of speed, but it seems to me that this could scale very well. Even though my personal blog/wiki will likely not exceed a few thousand (or a few tens of thousand) pages in a life-time, I can't see this slowing down. As long as I have the name of a page, I have its location, and I am going (or rather, having the operating system go to it) directly. Should work for any number of files.
Thoughts?
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] |
|
|
The key behind DBM::Deep that your flatfiles will not be able to do is that DBM::Deep is low-memory. Everything happens on disk. Presto (an OODBMS built upon DBM::Deep) will take that even further, providing you with the ORM-like syntax without the object-relational impedance mismatch.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
| [reply] |
|
|
Just tried out DBM::Deep. Easy to install, create a database, and start working. But there are a few issues which might be because I don't know enough.
Issue 1: Size -- I converted a two-column 230k row table into a DBM::Deep and a DB_File database respectively. The 14 Mb SQLite file became a 25 Mb DBM::Deep file, but shrank to a 5 Mb DB_File file.
Issue 2: Spped -- A simplistic benchmark of counting the number of records in the table gave the following
Benchmark: timing 1 iterations of DB_File, DBM::Deep, SQLite 3...
DB_File:2 wallclock secs ( 1.90 usr + 0.15 sys = 2.05 CPU) @ 0.49/s
+
DBM::Deep:93 wallclock secs (79.24 usr + 9.42 sys = 88.67 CPU) @ 0.0
+1/s
SQLite 3:0 wallclock secs ( 0.04 usr + 0.01 sys = 0.05 CPU) @ 19.61
+/s
My code was simple "SELECT COUNT(*) FROM sqlitedb" for the SQLite db, and "return scalar keys(%$db)" for the other two databases. Is this expected (in particular, is the slowness of DBM::Deep expected, or is there a better way to do this query?
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] |
|
|
|
|
|