Re: How do remove trailing data?

My opinion, regex is overkill for such situations. Might be the developers considered all situations, and made regex handle the situation efficiently, but it just seems wrong.

Efficient and simple: split on the # and take the first component:

my $url = ( split '#', $urlplusplus )[0]
[download]

or use index() to find where the # is, and use substr to fetch the prefix.

my $url = substr $urlplusplus, 0, index( $urlplusplus, '#' );
[download]

Of course, benchmark might provide some surprises.

As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Comment on Re: How do remove trailing data? Select or Download Code

Replies are listed 'Best First'.
Re^2: How do remove trailing data? by JavaFan (Canon) on Dec 24, 2010 at 14:23 UTC
The split option fails if there's more than one '#' in the string - or if the '#' isn't followed by digits. Note that the regexp engine is optimized for patterns like `/#[0-9]+$/`: perl -Mre=debug -e '$_ = "foobar#2"; s/#[0-9]+$//' Compiling REx "#[0-9]+$" Final program: 1: EXACT <#> (3) 3: PLUS (15) 4: ANYOF[0-9][] (0) 15: EOL (16) 16: END (0) anchored "#" at 0 floating ""$ at 2..2147483647 (checking anchored) mi +nlen 2 Guessing start of match in sv for REx "#[0-9]+$" against "foobar#2" Found anchored substr "#" at offset 6... Found floating substr ""$ at offset 8... Starting position does not contradict /^/m... Guessed: match at offset 6 Matching REx "#[0-9]+$" against "#2" 6 <foobar> <#2> \| 1:EXACT <#>(3) 7 <foobar#> <2> \| 3:PLUS(15) ANYOF[0-9][] can match 1 times out o +f 2147483647... 8 <foobar#2> <> \| 15: EOL(16) 8 <foobar#2> <> \| 16: END(0) Match successful! Freeing REx: "#[0-9]+$" [download] So, I'd go with the `s///` solution.	[reply] [d/l] [select]
Re^2: How do remove trailing data? by ww (Archbishop) on Dec 24, 2010 at 13:53 UTC
Preferring `split` to a regex makes little sense to me (obviously, YMDV) unless benchmarking supports that choice (and in OP's case, any difference in execution time would presumably be un-noticeable). Using split still requires the basic familiarity with a simple regex such as ahmad offered; is longer; and may require more than 'just a glance' to understand at some future date. ...but I agree that index, despite it's length and comparative complexity, may be a valuable alternative in the case stated by OP.	[reply] [d/l]