PerlJam has asked for the wisdom of the Perl Monks concerning the following question:

In my program I have a relative url stored in a variable. Sometimes it's in the format dir/page.html othertimes it's in the format /dir/page.html. I need the variable to be in the first format. I have tried using simple substitution to remove this, i.e.
$link =~ s/^\///;

This works in my test scripts but not my actual program. As well trying to simply match the preceding / fails.

My question is: Are there hidden values that might be at the beginning of the line. If so how do I check for them, I assume discarding them would be the same as discarding other values. I'll include my scripts below in case anyone wishes to parse over it. Notice the print before and after the substitution all. These both return the same value. (I guess that's obvious since I'm asking this question {:-)


#!/usr/bin/perl -w use strict; use HTML::SimpleLinkExtor; use LWP::Simple; use Image::Grab; #Set $save_dir to the directory path in which you want the files to be + saved. #Note: the directory must exist prior to program execution print "\n\nEnter the full path to the directory where you want your im +ages saved\n"; my $save_dir = <>; chomp($save_dir); my $exclude_pattern = "%"; print "Is the URL a directory path or a path to a webpage. 1) Directory path = \thttp://www.website.com/directory/ 2) Webpage path = \thttp://www.website.com/directory/webpage.htm Select 1 or 2: "; my $urlType = <>; print "\nEnter the full URL of the web page or directory that links to + the wanted images\n"; my $page_dir = <>; $page_dir = "http://" . $page_dir if $page_dir !~ /^http:\/\//; my $link; my $image; my $counter = 0; my $pic = Image::Grab->new(); my $html = get($page_dir); my $extor = HTML::SimpleLinkExtor->new(search_url=>'$page_dir'); $extor->parse($html); my @page_links = $extor->links; $page_dir = parse_url($page_dir) if ($urlType == 2); chomp($page_dir); foreach $link (@page_links) { if ($link =~ /jpg$/) { next if $link =~ /$exclude_pattern/; $counter++; my @temp = split(/\//, $link); #Ensure that only filename an +d not directory path my $filename = pop(@temp); #is placed in $filename @temp = ""; if ($link =~ /^http/) { #If returned link is hard-link use returned + link $pic->url($link); $pic->grab; } else { #If returned link is a relative link, construct full path. + print $link . "\n"; $link =~ s/\///; print $link . "\n"; $pic->url("${page_dir}${link}"); $pic->grab; } @temp = check_existance($filename, $save_dir); $filename = shift @temp; my $save_dir = shift @temp; print "$page_dir$link\n"; open(IMAGE, ">${save_dir}${filename}") || die "$!"; print IMAGE $pic->image; close IMAGE; print "${counter}: Saved ${save_dir}${filename}\n\n"; } } print "Finished!\n"; sub parse_url { my $page = shift; my @temp = split(/\//, $page); pop(@temp); my $page_dir = join('/', @temp) . "/"; @temp = ""; print "$page_dir\n"; return $page_dir; } sub check_existance { my $filename = shift; my $save_dir = shift; if (-e "${save_dir}${filename}") { print "${save_dir}${filename} exists. Renaming file!\n"; $filename = rand(999) % 999 . $filename; } return ($filename, $save_dir) }

----
Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated failures. Persistence and determination alone are omnipotent. --Calvin Coolidge (1872-1933)

Replies are listed 'Best First'.
Re: Failed match/substitution
by Anonymous Monk on May 02, 2001 at 03:59 UTC
    Looks like $page_dir has a trailing newline, and you need to chomp it.

    As a stylistic point, you have major LTS (leaning toothpick syndrome). Your code'll be much nicer to read if you use something like m#http://# instead of /http:\/\//.

      Thanks for your response. I'll try chomping it. Also thanks for the suggestion on using the alternate seperators. That'll read much better.
      ----
      Nothing in the world can take the place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated failures. Persistence and determination alone are omnipotent. --Calvin Coolidge (1872-1933)