in reply to regex behaves differently in split vs substitute?
-In general use split when you know what to throw away and that "throw away separator" is an easy to identify sequence in the input.
-Use regex when you know what you want to keep and you can either (a) write one regex that describes all the "hunks" that you want or (b) you can enumerate the patterns easily.
-Sometimes the techniques are best combined and that leads to more complicated regex patterns in the split. As a performance note, in many of my benchmarks, a regex match/match global is faster using a split. A complex regex in a split burdens the "slower but simple" split with something complicated.
It looks to me like you want to "split" when you see the first "-" that is before a number.. and that really means that a regex match solution is in order rather than a split.
There are other regex solutions - I don't claim that this is the best, but I do recommend trying to formulate a single forward pass regex (no look ahead or look behind) wherever possible because it will typically be the fastest.
Update: if you want to know if the regex succeeded, just check if $ver is defined or not. If $ver is defined, then $package will be also. Oh, there is no need to chomp() because the \s*$ will match and throw the trailing \n character(s) away. And oh, the regex substitution operation is very slow, relative to just "match and capture" because the data has to be copied to "make room" for the new characters - a "substitute and then split" strategy will be slow.#!/usr/bin/perl -w use strict; while (<DATA>) { next if /^s*$/; #skip blank lines my ($package,$ver) = /^\s*([a-zA-Z-]+)-(.+)\s*$/; printf "%-15s %s\n", $package,$ver; } =prints mono-basic 2.10 mono 2.10.2-r1 mono 2.10.5 =cut __DATA__ mono-basic-2.10 mono-2.10.2-r1 mono-2.10.5
|
|---|