punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Marigold Monks,

My download script dynamcially builds a filename before downloading the requested file. The filename contains important information that another script will need later on. But users want shorter filenames to save on their PCs. For other reasons, the URL is the only place this information can be stored, not in the file itself.

So, I'm looking for a "string compress", "string decompress" function that will take my approx. 90-character filenames and reduce them to at least half that size, then allow them to be re-inflated back to their original form.

Searches for "Perl compress strings" just send me to matrials on zipping files.

Anything out there?

Thanks.




Forget that fear of gravity,
Get a little savagery in your life.

Replies are listed 'Best First'.
Re: URL string compression?
by saintmike (Vicar) on Feb 13, 2006 at 23:39 UTC
    Have you thought about using some kind of short ID instead that the script later looks up on the server via HTTP?
      Hadn't thought of that. Would like to avoid storing temp files on the server to hold this information. But if all else fails, thanks for the fall-back solution.





      Forget that fear of gravity,
      Get a little savagery in your life.
Re: URL string compression?
by chrism01 (Friar) on Feb 14, 2006 at 00:43 UTC
    Assuming you have a DB (& most places do), how about storing the "ID" in the DB and (optionally if the files aren't too big) storing the files in a BLOB col in DB? Or store the files externally & store filepath along with ID in DB.
Re: URL string compression?
by BrowserUk (Patriarch) on Feb 13, 2006 at 23:29 UTC

    What characters are valid in the uncompressed filename?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Currently using only letters, any case; numbers; dashes, underscores. That's it.

      But other characters would be acceptable if it makes compressin possible.





      Forget that fear of gravity,
      Get a little savagery in your life.
        If you have a case sensitive system, that's 64 safe characters. If you can compressed (by packing numbers or otherwise) the data down to 33 bytes (floor((90/2) * (log2(64)/8))), you could use the following to convert to safe characters:
        use MIME::Base64; sub encode { my ($compressed) = @_; my $encoded = encode_base64($compressed); $encoded =~ s{\+}{-}g; $encoded =~ s{\/}{_}g; return $encoded; } sub decode { my ($encoded) = @_; $encoded =~ s{-}{+}g; $encoded =~ s{_}{/}g; my $compressed = decode_base64($encoded); return $compressed; }

        Update: On second thought, if people are gong to save these files on their own PCs, you'll need to be case-insensitive. That leaves 38 safe characters. If you wrote Base32 based on Base64 (a simple task), you'll have to compress the data down to 28 bytes (floor((90/2) * (log2(32)/8))).

        Update: Fixed attrocious math.

        As you've seen, with 64 characters in the input, that 90*6-bits = 67.5 (mostly unacceptable) 8-bit chars as your best "simple transform' compression. A bare 2/3rds compression, even if all the 8-bit chars were acceptable in a filename which the aren't.

        Your best hope is if your filenames can be split into various fields that can be represented by a number that is shorter than the fields text representation. For example: if one component of the name was one of 'North', 'NorthEast', 'East', 'SouthEast', 'South', 'SouthWest', 'West', 'NorthWest', that same field could be replaced by a digit 0-7, or maybe just 4-bits in conjunction with some other field with upto 3-bits.

        Without seeing examples of the filenames, and the range of values the fields within represent, it's hard to be more helpful.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Edit: g0n - reparented at authors request

Re: URL string compression?
by tweetiepooh (Hermit) on Feb 14, 2006 at 11:42 UTC
    Why not create the filename, copy the file to a shortname version then dish that over to the user?
Re: URL string compression?
by punch_card_don (Curate) on Feb 14, 2006 at 20:20 UTC
    Update to close the thread:

    I've decided to go with the suggestion of giving the file a unique identifier and storing the details in a file on the server. This will provide the added opportunity for tracking how people are using the system, so, two birds with one stone makes it worth it.

    Thanks for the replies and the code - will put that in my back pocket.




    Forget that fear of gravity,
    Get a little savagery in your life.