in reply to Re^2: Moving from SQL to flat-files
in thread Moving from SQL to flat-files

Before you come up with yet another metadata syntax, you might want to look at LDIF, which your syntax looks similar to.

But as for reading the data / metadata, it's not too hard:

my @data = (); while ( my $line = <IN>) { last if ($line =~ /^__METADATA__$/); push @data, $line; } # assuming no multivalue keys my %metadata = map { chomp $_; split( $_, /:/, 2) } (<IN>);

I'd personally put the metadata before the data -- as there's no chance in the marker being in the metadata section, but there is in the data. I don't know what your usage patterns are, so it might be an additional overhead to be skipping the metadata whenever it's not needed, though:

while (my $line = <IN>) { last if ($line =~ /^__DATA__$/); } my @data = <IN>;

I'd also avoid hashing on titles / names if the data is going to grow significantly, and the title isn't the only indexor. The english language just doesn't have good distribution.

Replies are listed 'Best First'.
Re^4: Moving from SQL to flat-files
by punkish (Priest) on May 09, 2006 at 17:01 UTC
    Thanks. I checked out LDIF, and while it is similar to what I am implementing, what else could it be? ;-). Mine is a very simple key:value scheme, and since it replicates one denormalized table, there are no multivalue keys (as of now!).

    The reason I want to keep metadata at the end of the file is that if I do open the file up in a text editor, the metadata is at the end. The first thing I see is the narrative page content, and I can edit that easily. If that is the only thing I am changing, then I don't even have to scroll down to the metadata section. Other relevant metadata value, namely, the mtime, will get changed by the operating system.

    One disadvantage of keeping the metadata in the same file is that I can't tie the metadata to a hash, or can't change just the metadata without changing the text as well. So, if I have to change any of the metadata, I have to rewrite the entire file even if I am not changing any of the file text. An alternative would be to store the metadata in a DBM-type hash... but that is more than the "simplest thing that could work."

    There is the issue of speed, but it seems to me that this could scale very well. Even though my personal blog/wiki will likely not exceed a few thousand (or a few tens of thousand) pages in a life-time, I can't see this slowing down. As long as I have the name of a page, I have its location, and I am going (or rather, having the operating system go to it) directly. Should work for any number of files.

    Thoughts?

    --

    when small people start casting long shadows, it is time to go to bed