Hi Guys as you can see from my 2 points of experiance, I am a rookie in your world even though i must have been working with perl for abotu 3 years now. Any way down to business. Im currently writing a content managment system, not a problem there as Javascript and HTMLPad are easy. The problem is that i would like people to put in a URL and then for the software to pull that page (easy) but then sort out all the links for images, backgournd tags etc.. I am already handled the style sheet, by parsing for it.. the collect it with UA and inserting that as page contents with the correct tags. My problems start to happen when someone puts in a URL like http://www.xxxxxx.co.uk/newsletter.htm for a start the page contains links like src="http://www.xxxxxx.com" which is really the same site just differnet domain name, or the page doesn't already contain the http://www.xxxxx.co.uk just ../../blah.gif or /blah/blah/funny.gif here is what i have already, as i know your wisdom is better in regexps than mine im hoping you can help or point me in the direction of a Perl module. As im a rookie, im sure you will find loads wrong with the code from the word go.. but here is a snippet..
#Pull the page and sort it out.. then display edit window use LWP::UserAgent; my $ua = LWP::UserAgent->new(); $ua->agent(""); my $content = $ua->get($fields{'url'})->content(); $fields{'url'} =~ s/http:\/\/(.*?)\/.*/$1/ig; $content =~ s/src="/src="http:\/\/$fields{'url'}\//sig; #Handle Styles $content =~ m/<link href="(.*?)"/ig; my $styleurl = $1; my $styles; if ($styleurl ne ''){ $styles = $ua->get($fields{'url'}.'/'.$styleurl)->con +tent(); } $styles = '<style type="text/css"><!--'.$styles.'--></styl +e>'; $content =~ s/<\/head>/<\/head>$styles/sig; $tpl_inner = &gettpl($skindir,'pointblank_templateadd2.htm +'); $tpl_inner =~ s/<!-- Content -->/$content/ig;

In reply to Pulling a Page with LWP::UserAgent and fixing URLs? by MrForsythExeter

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.