Key to use for files in db!

Ace128 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Key to use for files in db! by rashley (Scribe) on Dec 29, 2005 at 14:00 UTC
First of all, by "key", I assume you mean primary key. I'm also assuming that there's no guarantee of uniqueness in your filenames, or even directory names. If you were confident that the directory2/file.ext combination would be unique for each file, I'd parse those out into two fields, filename and directory, and use the combination as primary key. That way you wouldn't have to use TEXT, which is an understandable desire. If not, using the full path is probably your best bet, unless you wanted to just use an index, but that's usually cumbersome.	[reply]
Re^2:Key to use for files in db! by Ace128 (Hermit) on Dec 30, 2005 at 14:31 UTC
Ah, yes, primary key :)	[reply]
Re: Key to use for files in db! by turo (Friar) on Dec 29, 2005 at 14:03 UTC
Obfuscated code embedded on a database table entry? ... sounds good :-P If i've undestood the problem, i think you can put this: `perl -MDumpvalue -e 'my $dumper=new Dumpvalue; %hash=eval qq(return (k +ey1,value1,key2,value2)); $dumper->dumpValue(\%hash)'` [download] In that case, you must store the string 'return (key1,value1,key2,value2)' and then make the eval; in fact something like this: `use Dumpvalue; use DBI; my $dumper = new Dumpvalue; my $mysql = new DBI(...); my $mysql->conect(); my $hash_string = $mysql->do('select .. from table'); %hash=eval $hash_string; $dumper->dumpValue(\%hash)` [download] Good Luck! turo `perl -Te 'print map { chr((ord)-((10,20,2,7)[$i++])) } split //,"turo"'`	[reply] [d/l] [select]
Re: Key to use for files in db! by TedPride (Priest) on Dec 29, 2005 at 16:48 UTC
The problem is that searching large numbers of files becomes very inefficient if you have to search a large string or strings. The best way to do this I think is to create a short hash - no more than 3 hex characters - from the full path, then index that and use it as the first part of your search. You might possibly get 2 or 3 files that have the same 3-character key, but these will require very little processing time for checking the full path. `SELECT FROM mytable WHERE hashkey = '4ea' && path = '/my/path/to/file/ +isthis.dat';` [download] With hashkey indexed. `use Digest::MD5 qw(md5 md5_hex md5_base64); my $path = '/my/path/to/file/isthis.dat'; my $hashkey = substr(md5_hex($path), 0, 3);` [download] EDIT: I should also note here that variable-width fields can't be indexed in their entirety. You have to define a width for your key, which means either leaving out part of some variable-width fields or padding most of them (which defeats the whole purpose of a variable-width field). It's much better to use a CHAR(3) key (see above).	[reply] [d/l] [select]
Re^2: Key to use for files in db! by Ace128 (Hermit) on Dec 30, 2005 at 14:48 UTC
Alrigthty. I actually had something like this in mind, but put black on white like this makes it seem even better. Thanks! As it is now I got just that $path as VARCHAR(255) and primary key. With your little addon, I suppose that using the $hashkey aswell as CHAR(3), will make the lookup really fast, since the calculation if the hash is faster (?)...	[reply]
Re^2: Key to use for files in db! by pajout (Curate) on Jan 02, 2006 at 13:05 UTC
Imho, your idea is good, but, why code it into perl when database engine is able to support it technically? If you use all features of the specific implementation of indexes, you can force the same effect and your sql queries will stay clear... I know that this theme is off topic and I don't want flame, I only advice to observe and to use features of sql engine before modifying perl code.	[reply]
Re^3: Key to use for files in db! by Ace128 (Hermit) on Jan 05, 2006 at 22:53 UTC
Em, how do you mean?	[reply]
Re^4: Key to use for files in db! by pajout (Curate) on Jan 06, 2006 at 11:12 UTC
Re^5: Key to use for files in db! by Ace128 (Hermit) on Jan 17, 2006 at 19:17 UTC
Some notes below your chosen depth have not been shown here
Re: Key to use for files in db! by pajout (Curate) on Dec 29, 2005 at 15:30 UTC
If you want to access the rows via full path (you need this column in the table), and need not to reference this table from another one, you can use the full path as primary key. The effectivity of the index and proper column type depends on its implementation. Which database did you mean? PostgreSQL?	[reply]
Re^2: Key to use for files in db! by Ace128 (Hermit) on Dec 30, 2005 at 14:41 UTC
Well, mySQL, SQLite, PostgreSQL.. you name it :)	[reply]
Re^3: Key to use for files in db! by pajout (Curate) on Jan 02, 2006 at 12:48 UTC
No, I named PostgreSQL only :) Imho, in PostgreSQL, you should try to create the table with fullpath (columntype TEXT) as the primary key (which forces creating the most common type of index), fill it with real data (minimally 1000 rows), analyze the table and observe the optimizer compliancy to use it when typical queries happens. But, I personally have more conservative opinion, I typically create the integer column as primary key and the fullpath text column with NOT NULL UNIQUE constraint. And, after that, if you will not be satisfied, you can obtain more informations about performance tunning at http://www.postgresql.org/docs/ or http://archives.postgresql.org/pgsql-performance/ Good luck on this non-perl specific journey ;)	[reply]
Re: Key to use for files in db! by SamCG (Hermit) on Jan 18, 2006 at 20:02 UTC
Hmmm, looking at your problem, are you sure you want to use the path in the primary key at all? What happens if you move a set of files? Multiple updates on primary keys can be a pain... I'd probably go with a unique number, something that I don't need to worry about updating, and then leave the path in a separate column(s).	[reply]
Re^2: Key to use for files in db! by Ace128 (Hermit) on Jan 20, 2006 at 21:27 UTC
Yea, I have it like that now. The unique number is for other tables to use as a foreign key.	[reply]