Re: Stripping part of URL

If it's true that the thing you really want is always "pageN", then you should just match that:

$current_url =~ s{.*?/(page\d+).*}{$1}:
[download]

Note the use of "?" to invoke a "non-greedy match", so that the initial ".*" will stop matching as soon as there's a slash followed by "page\d+".

Comment on Re: Stripping part of URL Download Code

Replies are listed 'Best First'.
Re^2: Stripping part of URL by htmanning (Friar) on Mar 24, 2015 at 03:35 UTC
This worked! Thank you.	[reply]
Re^3: Stripping part of URL by mark4 (Acolyte) on Mar 24, 2015 at 14:57 UTC
Hi, This is just my two cents. I tend write very verbose code (sorry in advance). Most of what I point out here may be obvious. $current_url = "http://www.domain.com/sub/name/anothername/page4/"; print "1. $current_url\n"; $save = $current_url; # adding "\|\| die" will alert you to a pars problem. $current_url =~ s/.?\/(page\d+)./$1/ \|\| die "Cant pars $current_url\ +n"; print "$current_url\n"; # I personally like to do this, It's a lot more code but it allows # you to recoved from a pars error. # or ignore URL's that dont match your expected format) $current_url = $save; print "2. $current_url\n"; if ($current_url =~ /^http[s]{0,1}:\/\/.+\/page(\d+).*$/i) { $current_url = $1; print "Decimal page number is: $current_url\n"; } else { print "pars error on $current_url\n"; #Do what you may with this issue, but you # know it happened... } [download]	[reply] [d/l]