Re: A story of a Perl Monk and Problem
by Hero Zzyzzx (Curate) on May 19, 2001 at 18:12 UTC
|
Well, this would be easy to do with an RDBMS, and given the sheer number of files, this may be the best way to do it.
Read the directory in and put it into a table, in the method of your choice. In mySQL, an autoincrement field can create an "index" field for you, by which you can have a field that allows you to order and manage the files, separate from the filenames, which may or may not be sequential. Then, using the "limit" function, you can select and create links to the next "x" files, like so:
select id,filename from files where id=$startid order by id limit 10
It's then a simple thing to create a script that would allow you to page through these files. This would be very fast also, given the RDBMS backend, and you could have users choose how many to see per page.
If the files change frequently, you can set up a cron job that would regularly update the table at the interval of your choice.
There are advantages and disadvantages to this system, of course, but I've done something similar. The script I wrote manages a directory with about 1600 image files in it, and it works excellently.
| [reply] [d/l] |
Re: A story of a Perl Monk and Problem
by chipmunk (Parson) on May 19, 2001 at 19:32 UTC
|
When using IO::Dir, the value you pass to $dir->seek(POS) should be a value returned from $dir->tell().
These methods are wrappers around Perl's builtins, seekdir and telldir. | [reply] [d/l] [select] |
|
Will the POS change if data in the direcotyr changes?
What are the security implications of passing a seek POS through the web ?
Brought to you by that crazy but lovable guy... lindex
| [reply] |
Re: A story of a Perl Monk and Problem
by Brovnik (Hermit) on May 19, 2001 at 20:07 UTC
|
Seek isn't really for skipping forwards into unknown territory like this unless you know enough about the file format to be able to know exactly where you want to go to. In particular, you can't do a "skip the next 100 files" command.
However, if you do a
push(@tells, $dir->tell());
at the start of each page, it will allow you to use seek later to skip back to any of those particular points later using the values stored in the @tells array. E.G.
# have now read through all files once and stored every
# Nth position in @tells
my $dirpos = @tells / 2; # start in the middle
my $browsing = 1;
while ($browsing)
{
my $action = "";
$f = $dir->seek($tells[$dirpos]);
# code to go here to read next N files and display
# results to user.
# Come back here when we have an submit from the user
# and $action set to the result.
if ($action eq "pageforwards")
{
#should check for end
$dirpos++;
}
elsif ($action eq "pagebackwards")
{
#should check for start
$dirpos--;
}
else
{
# do other actions
$browsing = 0;
}
}
This way, you only have to store a value for every Nth file,
which is a big reduction in storage.
-- Brovnik. | [reply] [d/l] [select] |
|
But wouldn't it still need to go through the directory on every request from the web? I still think an RDBMS would be better. You'd only loop through the files when you create the table, and the memory requirements would be minimal, beyond the mySQL daemon running. After you had your table with filenames, you would then only select the few filenames you needed to create each index page. The list of files is already prepared and stored in the table, there's minimal extra stuff involved to give a user a page.
| [reply] |
|
Yes, it would. This falls into the "If I were trying to get there, I wouldn't start from here category", but I was answering the specific point about "how do I use seek(POS)" rather than the broader "how do I present 90,000 files to the user".
I agree with thpfft trying to present them all to the user isn't the way, and a search would be much better.
Unless the filenames are descriptive (and this is difficult if they are in 8.3 notation), the search needs to be on some content or keywords related to the file as well, so you really should have some sort of persistent Database interface to the directory.
-- Brovnik.
Edit: chipmunk 2001-05-19
| [reply] |
Re: A story of a Perl Monk and Problem
by jepri (Parson) on May 19, 2001 at 18:05 UTC
|
| [reply] |
Re: A story of a Perl Monk and Problem
by thpfft (Chaplain) on May 19, 2001 at 21:09 UTC
|
Seems to me that no amount of seeking will let someone find the right file from 90,000. Perhaps you need to offer a different interface to the classical file manager. Let people search by date range or title, for example, or find some regularity in the data which will let you break the collection into chunks. But then you'd probably need a database for that too.
Anyway, perhaps the simplest paging mechanism would be just to pass the name of the file at the end of the previous list, rather than trying to carry numbers? Then you can use something like:
# $filename comes from input
my $marker;
$marker = readdir(DIR) while ($filename cmp $marker);
print "$_: " . readdir(DIR) . "\n" for (1..10) ;
Which assumes an alphabetical list but should survive deletion of the marker file, at least.
updated to remove stupid mistake before anyone notices. | [reply] [d/l] |
(dws)Re: A story of a Perl Monk and Problem
by dws (Chancellor) on May 20, 2001 at 02:54 UTC
|
... if he wrote this perl script to load the contents of said directory into an array, that the script would have a huge memory foot print.
Have you determined whether an occassional huge memory footprint is actually significant in the system you're building? Reading the directory into an array is simple to implement and test. If the CGI is going to be invoked relatively infrequently (e.g., a few times a minute) on a machine with adequate memory, the impact of the footprint might be insignificant in the grand scheme of things. Smaller footprint alternatives are more difficult to implement, and might be more compute intensive.
| [reply] |
|
Good Question, ++
Well on the perticular machine in question, I am developing this tool as a mod_perl application, so any memory used by perl is shared with apache
Thus if perl has an array in memory that is about 2.3mb then so does apache, this is not acceptable for me.
As well as the machine does web serving and other file processing, so good CPU and MEM stats are a must.
Brought to you by that crazy but lovable guy... lindex
| [reply] |
|
| [reply] |
|