Simple Regex Question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Simple Regex Question by Fastolfe (Vicar) on Jan 30, 2001 at 00:36 UTC
Update: I didn't realize a single space behaved like `/\s+/`, so ignore that bit below. That seems counter-intuitive to me, but whatever. That, and I'm used to specifying "real" patterns to split, not this awk-compatibility bit. You are splitting on a single space, which messes up if you have multiple spaces between your "fields" (u: this is the incorrect bit). You might use `/\s+/` as your split delimiter instead. A regex to get the first set of non-numerics out of your 3rd field could be `/([a-z]+)/` or `/(\D+)/`. Avoid the use of `.*` as you're doing, since it involves a bit of back-tracking and is generally less efficient than explicitely mapping out what you do want.	[reply] [d/l] [select]
Re: Simple Regex Question by arturo (Vicar) on Jan 30, 2001 at 00:39 UTC
"Doesn't quite do it" isn't a lot of information to go on, I'm afraid. Next time you post (and I do encourage you to post again), please explain what the problem is (what results did you get, what else have you tried, etc.). Are you sure the fields in each line are separated by a space? Maybe it's tabs? You know that what you want is in the third field, however the whole line should be split, and you can further narrow that down to what's on the left of the minus sign. So you can use `split` all the way through. Here's a snippet which makes a few assumptions, which I've tried to document. `# assuming it's tabs; change "\t" to "\s" or "\s+" as appropriate my ($router, $cache, $tmp, $as, $sample) = split ("\t", $line); my $host = (split "-", $tmp)[0]; # grab LHS of $tmp $host =~ tr/0-9//d; # strip any digits -- whether this is right # REALLY depends on your data` [download] HTH Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l] [select]
Re: Simple Regex Question by KM (Priest) on Jan 30, 2001 at 00:51 UTC
Please, read the perlre man page, as well as pick up a copy of Mastering Regular Expressions. This should do what you want.. `while (my $line = <CONFIG>) { chomp $line; if ((split /\s+/,$line)[2] =~ /(\w+)-/i) { print $1; } }` [download] Cheers, KM	[reply] [d/l]
Re: Simple Regex Question by lemming (Priest) on Jan 30, 2001 at 00:39 UTC
I'm hoping it's just a spelling problem, but shouldn't you be looking for "dpt" instead of "dta"? By the way, the `' '` in split is the same as saying /\s+/, except that /\s+/ spaces would produce a null field if there is leading white space. Follow the rest of their advice though.	[reply] [d/l]
Re: Simple Regex Question by sierrathedog04 (Hermit) on Jan 30, 2001 at 01:26 UTC
Inside your if statement I would say: `my $area = $tmp; # anchor the pattern at the start of the line using ^ # then look for the third clump of characters and pick everything +through up to the hyphen. # the ? turns off greedy matching, so you do not get messed up by +duplicate occurrences of -dpt on the same line. $area =~ s/^.\s+.\s+(.*?)-dpt/\1/; return $area;` [download]	[reply] [d/l]
Re: Re: Simple Regex Question by KM (Priest) on Jan 30, 2001 at 01:35 UTC
$area =~ s/^.\s+.\s+(.?)-dpt/\1/;* Not very efficient. The RE engine will have to work more than you think to match that pattern. From my test of it, it will turn a line like this: asdasd egg nyc-dpt net 10 into nyc net 10 Did you test this before posting, or look at the other answers? :) Cheers, KM	[reply]
Re: Re: Re: Simple Regex Question by sierrathedog04 (Hermit) on Jan 30, 2001 at 02:52 UTC
Your point is well taken. From now on I will test my answers first. My proposed solution is: `use strict; my $row1 = "newjersey-ab1.net agg nj1-dpta1 net 10"; $row1 =~ s/^.+\s+.+\s+(.+)-dpt.*$/\1/; print $row1;` [download] As far as looking at the other answers, yes I look at them. If my approach differs from the other answers then I like to throw it out there and see what people say. I agree with you that my answer may be inefficient; I really only starting doing Perl seriously last year. My question to anyone who cares to answer is, why is it inefficient? And is this inefficiency lost in the noise of overall execution times, or would it be a problem in real-life?	[reply] [d/l]
Re: Re: Re: Re: Simple Regex Question by KM (Priest) on Jan 30, 2001 at 03:03 UTC
Re: Re: Re: Re: Re: Simple Regex Question by sierrathedog04 (Hermit) on Jan 30, 2001 at 07:35 UTC