Oh yeah, putting things like ampersands and quotes into file names is one of the "features" of wget that tends to put that tool on my "do not use" list. I'd rather spend a little more time probing a web site myself, and using a perl script with the LWP module to focus on the sets of urls I really want -- and as I fetch each page, assign a sensible file name (with no shell-magic characters) to save it locally.
But trying to maintain the linkages among the href's inside each file is a bit more challenging; jeffa's reply has the basic approach: convert all the wget-assigned file names to sensible names first (making sure to avoid collisions), rename the files, and keep the old-new relations in a hash; then, for each file in the harvest, replace all occurrences of a wget-style (cgi-based) file name string with the corresponding sensible name. Tedious, but not so difficult.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.