in reply to Re: find module name from the module archive
in thread find module name from the module archive

Thanks for the suggestion. I do prefer to avoid having to use $1 for just trimming some characters. My regex worked for what I needed to do and since I'm only running this on 15 to 20 strings I wasn't worried about optimization. I find that other people's regexes are usually hard to read at first. I'm curious why you say mine is inefficient. Did you do some benchmarking?

In your regex ( $module =~ s/\-[^-]*$//; ) the backslash isn't needed for a '-'.

Rule of thumb: Try to make all your patterns start with a character and not a wildcard.

I was just reviewing the regex documentation and I didn't see that one listed. There are a lot of examples however where they don't start with a character. Are you including character classes as characters?

Replies are listed 'Best First'.
Re^3: find module name from the module archive
by shawnhcorey (Friar) on Dec 21, 2016 at 19:51 UTC

    A rule of thumb would not be in the official documentation. Yes, character classes are better than wildcards.

    The regex you have start with .*, which will match the entire string. It will then look for the next pattern, a minus sign. Since it's at the end of the string, it will fail. It will back up on character and look for the minus sign there. Not finding it, it will back up one more, Etc. Not very efficient.

    The backslash before the minus is not needed but I tend to program defensively. A backslash before any ASCII non-letter will escape it. This is in case a new meta-character is added in the future.

    You may be correct in that the regex I gave may not be more efficient in that it too has backtracking. A more efficient way would be:

    $module =~ s/\-[^-]*+$//;

    The extra plus sign stops backtracking. This regex would scan until a minus sign, scan non-minus-signs, and look for the end of the string. If it's not at the end, the entire pattern will fail and it will have to start over. That is, it will start the pattern from the beginning but from where it currently is in the string. It will scan the string in one pass without any backtracking.