Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

letting a browser client select a file to download by inode

by leocharre (Priest)
on Dec 27, 2005 at 01:07 UTC ( [id://519228]=perlquestion: print w/replies, xml ) Need Help??

leocharre has asked for the wisdom of the Perl Monks concerning the following question:

i have a web interface file sharing application i'm developing (the files are on a per user basis, there can be 2 files that two different users can access, that is, one file may be accessed by usera, but not by userb)

i keep a table for files- file information- things like- filepath, inode, creation date, file description, etc.

Here's the kicker, I am using inode as unique row identifier- instead of an auto increment id.

Why?

  • inode is already unique by the filesystem.
  • people can rename the files via the filesystem- and i have a checkup cron for example, that can make sure the file names match the inodes- and update the db as needed, with new file names for existing entries in table
  • a file can be queried for its inode, and that lets us know what record to look up in the table

My questions:

  • I understand we must never return sensitive data to the browser- i therefore should be returning a file id number or reference code, instead of an inode number, that sounds sensitive, right?
  • users select files to download, etc from a list- therefore the browser client returns to the code- inodes... how sensitive is this, am i doign something really dangerous ?

background
what the main purpose of this app is, is to let specific users download specific files. when a user requests a file via the browser by inode (instead of reference id, etc) - the code checks that a specific permission for that user to that one file exists - if a user without a permission to file x, would be turned down, kicked out, error logged

Replies are listed 'Best First'.
Re: letting a browser client select a file to download by inode
by Fletch (Bishop) on Dec 27, 2005 at 02:11 UTC

    Erm, that just strikes me as a bad idea. It's the same reason you don't use a "product number" or similar external identifier as a primary key: it's subject to change outside the database that's just begging to get out of sync. What happens if you move to a different machine (or even just a different filesystem on the same box)? What happens if the drive crashes and things get reloaded from a backup? You've just set yourself up to write more code to deal with these contingencies (which means more development time, more testing (and are you really sure you've covered all the cases?)).

    Just seems like you're being overly clever to "save" yourself or your DB from doing a trivial bit of work. If you want something tied to the file itself an MD5 or the like would be a better choice than the inum.

      Yes I am being clever.

      Sometimes being clever and using bits and pieces of what the community has offered (ext 3 in this case) is what open source is all about- but then sometimes you *are* simply.. being clever for no good reason.

      Isn't inode *the* way that ext3 and it's db trusts to keep up with what files are in the system? If it's good enough for my computer, shouldn't it be good enough for everything else ? This is why I thought maybe this was indeed the right way to go; because inode is the way that the machine keeps track of the files. And I kind of trust it more then me.

      The environment is one of file-sharing with specific people. People who know little about machines (windows users) will be creating these files to share with even less computer savvy people (more windows users) - on a per person basis.

      The people creating the files have power to- through the filesystem; rename the files! There has to be a way to keep track of the file. MD5 had some problems in 2004, some stuff about collisions .. dunno.. not my field.. but.. Is it still safe to check data on MD5 sum? - incredibly interesting suggestion!

        Sure its good enough for the file system. Applications however use filenames because those wont change even if you have to delete and recreate files, or reload files from back, or install a new hard drive etc. You could always store the location and the inode, then use your inode link to update names when they change the filename on the system. Or you could give them a web based way to change the name so that you can keep track of it that way.


        ___________
        Eric Hodges
        no, MD5 is not the choice but any better digest function may be it.
        I suggest SHA512 (well, you could also use SHA256 - there is also a SHA384 but it's just the same as calculating SHA512 and throwing away the extra bits)
        <edit>typo fixed</edit>
        Is it still safe to check data on MD5 sum?

        For this purpose absolutely. The collision attack that was discovered against MD5 means that some very smart people have managed to create two different bits of data which produce the same MD5 hash. The creation of a modified file the MD5 of which matches that of an existing, "real-world" file is as yet only theoretically possible. And the chance of this happening by accident on your machine is less than that of your server, all backups and your pants spontaneously combusting :-).


        A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox
Re: letting a browser client select a file to download by inode
by blazar (Canon) on Dec 27, 2005 at 15:29 UTC

    Others already expressed their perplexities with this idea. Personally, I've never had to do anything like this - but I've been considering the problem of "serving" (in a loose sense) files without "exposing" them. Now, an option that occurred to me is to create a temporary symlink to the actual file with File::Temp, having a separate process removing it asynchronously after a suitable timeout. I'd like to hear more experienced programmers' opinion about this scheme...

      blazar, i want to point out that this method only tells the server what file you want- that's all!

      Your mention of a symlink is very interesting, it's a thought i had chewed on and i solved in another way thanks to the help of this posting: at job help

      What i had thought of.. was to create a temporary sym link to a file, an at job would delete the sym link in x time..

      I was very lucky to get some incredibly useful thoughts on that link up there.. and ended up streaming the thing .. much better. (the original doc resides outside of http accessible realm).

      Take a look at the streamer code</a

Re: letting a browser client select a file to download by inode
by esskar (Deacon) on Dec 27, 2005 at 02:10 UTC
    why do you think that publishing an i-node will be dangerous?

      More that passing out arbitrary files by inum could allow people to get access to files that they shouldn't (e.g. if your documents are served from the same filesystem as your Apache configuration someone could figure out the likely inum for httpd.conf; worse if your document root is on the same filesystem as /etc).

        I want to underscore the following: submitting an inode num to the server is not the only requirement to download a file

        There is already a whole method of identifying the user by ip, auth, session time, etc etc- it does happen in ssl. The validity of your 'pageview' is checked with every action etc.. If one were to try to view or download a file they cannot, they are pooped out.

        So, first an inode is submitted. Then the inode *must* be in my valid 'files' table which does *not* record about anything but .doc, .pdf, etc like "document" files.

        Second, there is a "files to users" (normalisation) table that simply establishes a relationship between a user and a file. To download a file, you must have an entry in the files to users table. If not, the app freaks the hell out, ends your session, sends a notice for the admin to view the logs.

Re: letting a browser client select a file to download by inode
by superfrink (Curate) on Dec 28, 2005 at 00:00 UTC
    inode is already unique by the filesystem.

    This is not true. Under many unix filesystems the inode (Index NODE) number is unique within a partition.

    This is important because multiple filesystems (each on it's own partition) can be mounted by a unix system at the same time at different directories. This means your web server's /usr and /var can both have inode number 4615 for one file below it's mount point.

    Some filesystems do not use inodes. Eg FAT and FAT-32. When Linux (used beause I'm familiar with it) mounts a fat32 filesystem it assigns inode numbers as the directories and files are accessed.

    This means if you unmount and remount the filesystem you are quite likely to get different inode numbers assigned to the same file. UPDATE: Just to be clear this referrers to filesystems that do not use inodes.

    Personally I would be tempted to use a database "sequence" since I would be storing account info, etc in a database anyway. In MySQL you can use an "auto increment" field.

    If you are not using a database you can still keep track of a sequence number in a file but you have to be sure your scripts lock the file so you never reuse a sequence number.

      Yes indeed. Mounting and umount ing.. Great hell fun that was. A lot of the junk we serve will possibly be on .. guess what.. ntfs.. yeah.

      I want to support this kind of activity (renaming via fs and keeping db reliable) - but i just see no way. with reg files, i use md5sums.. and that's a charm. But with directories ? oh boy..

      if a mounted section of the data being served is umount ed, then changes happen, dirs get renamed.. and it gets mounted again a day later.. i'm screwed!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://519228]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-03-29 06:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found