Re: Web Scraping with Find / Replace (Mojo::DOM)

Hi,

#TODO Replace all of the links with fully qualified url's
#TODO Save the master_content to a file with the same file name

Here you go

use Path::Tiny qw/ path /;
path( $newFileName )->spew_utf8( qq{<base href="$insert_str">}, $conte
+nt  );
[download]

You might need to html-escape $insert_str ... could use Mojo for that part

$ perl -Mojo -e " $dom = x(q{<base>}); $dom->at(q{base})->attr(qw{href
+ http://example.com/?&}); print $dom "
<base href="http://example.com/?&amp;">
[download]

See Path::Tiny, https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base, https://metacpan.org/pod/ojo#x

Comment on Re: Web Scraping with Find / Replace (Mojo::DOM) Select or Download Code

Replies are listed 'Best First'.
Re^2: Web Scraping with Find / Replace (Mojo::DOM) by sjfranzen (Initiate) on Dec 02, 2016 at 16:25 UTC
Thank you for your response. Unfortunately I do not understand your approach or how to include in my script.	[reply]
Re^3: Web Scraping with Find / Replace (Mojo::DOM) by beech (Parson) on Dec 04, 2016 at 20:55 UTC
Well, If you add a base tag to the html content, then there is no need to rewrite relative links into absolute links, its a shortcut provided by html The spew part of the code does that with a helper module for creating a file Second part shows creating/modifying a base tag with Mojo which will htmlescape the url	[reply]