John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I read that Perl 6 is going to avoid the filename syntax issue by using URI's (also known as URL's) to provide a platform-independant file naming convension.

I like that idea, and want to do something like that on the application level now.

So, what's the syntax for URI's for local files? I found the official specs, and it just says "file://" and that "/" is used as a part separator.

But I see more translation being done under Windows. I see three slashes and a vertical bar character in there. I recall from many years ago that they seemed different in different browsers, but hopefully that's not the case anymore. Anyone know what the "standard" is for Windows' file URI's? Or what base document is the Windows version of Perl6 using?

Off hand, I'm thinking that adding a function to File::Spec to translate URI and native naming syntax is the way to go. That's where all the system-specific naming stuff is, and it already loads the right submodule for the platform.

—John

Replies are listed 'Best First'.
Re: syntax for URI of files?
by hardburn (Abbot) on Mar 06, 2003 at 16:13 UTC

    URIs and URLs are not quite the same thing. URL is a special kind of URI that specifies a physical server that a document exists on. There are some protocols (such as Freenet (external link)) where a peice of data exists "out there", but you can't pinpoint an exact location. In such a case, you can't use URLs, but URIs work just fine.

    Anyways, URI syntax for files is just 'file://' followed by the absolute path to the file. So an example of a Win32 file URI would be 'file://C:\windows'. On *nix, you get a horrid series of toothpicks: 'file:///usr/local' (though I've noticed that KDE assumes the begining slash on the filename, thus allowing 'file://usr/local').

    Update: I forgot that ':' has special meaning in a URI. As another poster pointed out, a Win32 file URI should be 'file://C|\windows'.

    ----
    Reinvent a rounder wheel.

    Note: All code is untested, unless otherwise stated

      Last I read, URL is being dropped from technical documents. At http://www.w3.org/Addressing/ "URL Uniform Resource Locator. An informal term (no longer used in technical specifications) associated with popular URI schemes: http, ftp, mailto, etc." Also http://www.w3.org/Addressing/9710-uri-vs-url.html.

      I find what you mentioned in the 1998 spec, and I guess the "cool URLs don't change!" backlash drove the quest that things on the web be URI's, not something that happens to be the current location but might change.

Re: syntax for URI of files?
by jasonk (Parson) on Mar 06, 2003 at 16:08 UTC

    They generally contain three slashes, because you take a file path that starts with a slash (such as /etc/passwd) and then add the file:// to the beginning, so you get file:///etc/passwd. The vertical bar is what windows uses in place of :, since the : has a different meaning in a URI, so C:\foo.txt becomes file://C|/foo.txt.

      Mozilla is showing three slashes before the drive letter, meaning a leading slash before the drive letter. Hmm, and it's showing the colon as-is! IOW, file:///C:/foo.txt.

      So, if | is used for :, what's the meaning of : within a URI? I know it's used to separate the original service name from the rest, but don't recall any special meaning on the right side. Is that true for all colons in a file name (on systems where that is a legal part of a name, or has a special meaning to the file system)?

      And what do you do about |'s in the name that were there to begin with?

      —John

        So, if | is used for :, what's the meaning of : within a URI? I know it's used to separate the original service name from the rest, but don't recall any special meaning on the right side.

        I don't think it has a special meaning, but it's a reserved character for the entire URI. And yes, many browsers will let you get away with using ':' instead of '|'.

        And what do you do about |'s in the name that were there to begin with?

        You beat up the person who named a file with a '|' (:

        ----
        Reinvent a rounder wheel.

        Note: All code is untested, unless otherwise stated

        The reason there are three slashes is that there is an optional host name that comes between the second and third slash: e.g. file://novell01/departmentshare/me/myfile.txt
Re: syntax for URI of files?
by Elian (Parson) on Mar 06, 2003 at 22:19 UTC
    I read that Perl 6 is going to avoid the filename syntax issue by using URI's (also known as URL's) to provide a platform-independant file naming convension.
    Then someone took something out of context. We're not using URI syntax to open files. We may have open smart enough to know how to deal with URLs, at least http: ones, but you certainly won't be using them for plain file access.
      Yipes!

      I can't find it in the Apocalypse now. But I know I read something quite specific on that idea, that unifying file names across OS's by using URL syntax will eliminate some of the crud in writing portable scripts. Perhaps it didn't make it into the Apoclypse, or at least not covered yet.

      —John

        I think it would've made it into the apocalypses already. You may have remembered something from perl6-language or one of the RFC proposals, since I recall it being proposed, but it's definitely a non-starter. Optional, perhaps, but definitely not mandatory.
Re: syntax for URI of files?
by demerphq (Chancellor) on Mar 06, 2003 at 21:34 UTC

    I read that Perl 6 is going to avoid the filename syntax issue by using URI's (also known as URL's) to provide a platform-independant file naming convension.

    Really!? What a wonderful way to alienate the entire Windows world. The decision to reserve colons in URI was a bad idea. Its the reason that Win32 folks dont like URI's. To embrace it in perl6 is an even worse idea.

    I know that this is a pretty negative POV, but I'm willing to bet that whomever thought up borrowing URI's for perl doesnt use a Win32 box much. file:///D|WTF!

    Oh well, I suppose ill get used to it at some point. *sigh*


    ---
    demerphq


Syntax for Unicode chars in URI's?
by John M. Dlugosz (Monsignor) on Mar 06, 2003 at 16:32 UTC
    And what's the mechanism for non-ASCII characters in a URI? RFC 2396 mentions two-digit hex values for escape codes.
Re: syntax for URI of files?
by gmpassos (Priest) on Mar 07, 2003 at 05:12 UTC
    Well, I think that Perl6 really need the URI. Everyone that have tried to use Perl5 on Mac saw the problem, and the difference between file systems don't stop here.

    About the use of file://c:/foo instead of file://c|/foo, weel I think that the both will work, since we have many programms that use them today.

    About the use of reserved characters in file names, well, we can't tell to someone that use a OS to not use some symbol because in other doesn't work! In other words, in theory we need to can put "any" symbol in file names. To do that, how about use the %HH, that every body know in the URI? We already use that for spaces, %20.

    This is a inportant point, since when we talk about Perl6 we say Parrot, and Parrot is not only for Perl, is a VM for many languages. And I think that some upgrades to the rolls of URI can be made, since URI was defined a "long time ago" and was tested very well, and now we can know the best options to choice.

    Graciliano M. P.
    "The creativity is the expression of the liberty".

      In the URI::File module, it shows
      Unix URI ---------- ------------------ foo/bar <==> foo/bar /foo/bar <==> file:/foo/bar /foo/bar <== file://localhost/foo/bar file: ==> ./file: <undef> <== file:/fo%00/bar / <==> file:/
      that is, one slash only if no host, but doesn't show three slashes for that same case. But, I thought the leading // after the file: meant that the rest of the name uses the slash-separated hiarachy, otherwise is opaque, so one slash isn't right? But that wouldn't explain relative URL syntax.

      If we have %HH syntax, why map : to | at all? It's not like anyone types filename urls in that syntax.