in reply to Re: URL string compression?
in thread URL string compression?

Currently using only letters, any case; numbers; dashes, underscores. That's it.

But other characters would be acceptable if it makes compressin possible.





Forget that fear of gravity,
Get a little savagery in your life.

Replies are listed 'Best First'.
Re^3: URL string compression?
by ikegami (Patriarch) on Feb 13, 2006 at 23:42 UTC
    If you have a case sensitive system, that's 64 safe characters. If you can compressed (by packing numbers or otherwise) the data down to 33 bytes (floor((90/2) * (log2(64)/8))), you could use the following to convert to safe characters:
    use MIME::Base64; sub encode { my ($compressed) = @_; my $encoded = encode_base64($compressed); $encoded =~ s{\+}{-}g; $encoded =~ s{\/}{_}g; return $encoded; } sub decode { my ($encoded) = @_; $encoded =~ s{-}{+}g; $encoded =~ s{_}{/}g; my $compressed = decode_base64($encoded); return $compressed; }

    Update: On second thought, if people are gong to save these files on their own PCs, you'll need to be case-insensitive. That leaves 38 safe characters. If you wrote Base32 based on Base64 (a simple task), you'll have to compress the data down to 28 bytes (floor((90/2) * (log2(32)/8))).

    Update: Fixed attrocious math.

      Original filename was 125 characters.

      "Compressed" filename is 175 characters.




      Forget that fear of gravity,
      Get a little savagery in your life.

        Sorry, I wasn't clear.

        You have two problems. The first is compression. The second is encoding the compressed result into safe characters. I was addressing the latter problem.

        If you use my suggested encoding method, you first need to compress your data down to 28 bytes. Base32 will convert your compressed data into 45 (ceil(28*(8/log2(32)))) safe characters.

        What information is contained in the original file name? Consistently compressing by 78% (1 - ceil(28/125)) will be hard, and will only be possible with intimate knowledge the data to compress.

Re^3: URL string compression?
by BrowserUk (Patriarch) on Feb 14, 2006 at 00:32 UTC

    As you've seen, with 64 characters in the input, that 90*6-bits = 67.5 (mostly unacceptable) 8-bit chars as your best "simple transform' compression. A bare 2/3rds compression, even if all the 8-bit chars were acceptable in a filename which the aren't.

    Your best hope is if your filenames can be split into various fields that can be represented by a number that is shorter than the fields text representation. For example: if one component of the name was one of 'North', 'NorthEast', 'East', 'SouthEast', 'South', 'SouthWest', 'West', 'NorthWest', that same field could be replaced by a digit 0-7, or maybe just 4-bits in conjunction with some other field with upto 3-bits.

    Without seeing examples of the filenames, and the range of values the fields within represent, it's hard to be more helpful.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    Edit: g0n - reparented at authors request