There has to be an easier way...

bxjoh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: There has to be an easier way... by btrott (Parson) on Aug 16, 2000 at 01:00 UTC
Did you try split? `my($clliA, $clliZ) = (split /\\|/)[14,15];` [download] You may need to adjust those indices a bit, because I didn't know whether you meant columns 15 and 16 (starting from 1), or what.	[reply] [d/l]
RE: Re: There has to be an easier way... by bxjoh (Initiate) on Aug 16, 2000 at 01:12 UTC
I know that the split would work but I split the line later. At this point in the program I am just trying to pull Those 2 fields from the line so that I can pass those fields and the line into my sub-routine. I am trying to do it without having to split the line a bunch of times. Thx though	[reply]
RE: RE: Re: There has to be an easier way... by Adam (Vicar) on Aug 16, 2000 at 01:28 UTC
Not only does split not damage your data, it is much more efficient then a reg-ex. And if you tell it exactly which elements you are looking for, it gets even more efficient. Trust the monks, re-read the man page for split and use it.	[reply]
RE: RE: Re: There has to be an easier way... by bxjoh (Initiate) on Aug 16, 2000 at 01:28 UTC
oops well your way is quite a bit faster..... Thx man	[reply]
Re: There has to be an easier way... by Shendal (Hermit) on Aug 16, 2000 at 01:01 UTC
How about something like this: `#!/usr/bin/Perl -w use strict; my($str) = 'one\|two\|three\|four\|five\|six\|seven\|eight\|nine\|ten'; my($this,$that,$other) = (split /\\|/,$str)[2,7,8]; print "this:$this\nthat:$that\nother:$other\n";` [download] Hope that helps, Shendal	[reply] [d/l]
Re: There has to be an easier way... by BlaisePascal (Monk) on Aug 16, 2000 at 01:14 UTC
Your data is delineated by "\|"? How about: `($clliA,$clliB) = (split "\|",$source,16)[14,15];` [download] This splits $source into fields that were separated by \|, but only the first 16 of them, and puts them into a list, of which you the 15th and 16th elemensts thereof. Update I guess everyone thought of split...and are faster typers! Your use of .* is particularly inefficient. Going with a simpler 3-field case: /^.A.A.$/, it would match "shAzAm" by: `match "shAzAm" with ., can't find A, backtrack. match "shAzA" with ., can't find A, backtrack. match "shAz" with . match "shAzA" with .A match "shAzAm" with .A., can't find A, backtrack match "shAzA" with .A., can't find A, backtrack match "shA" with ., can't find A, backtrack match "sh" with .* match "shA" with .A match "shAzAm" with .A., can't find A, backtrack,` [download] and so forth. Imagine that with 16 .A combinations, like you had.	[reply] [d/l] [select]
Re: There has to be an easier way... by turnstep (Parson) on Aug 16, 2000 at 01:21 UTC
Also, assuming that the fields in between the \| characters do not themselves have \|'s in them, you could keep it as a regex by writing: `if (m#([^\\|]+)\\|([^\\|]+)$#) { $a=$1; $b=$1; } else { chomp; die "Bad line: $_\n"; }` [download] Note that this is better than just just saying `($a,$b,$c) = m/(your)(regex)(here)/;` [download] because a line that does not fit your idea of what should be there (in other words, if the regex fails) will put nothing into the variables on the left. I'd use the split myself, but this shows another way to do it, and it checks the data a little, too. Although it does not grab the 14th and 15th column, but merely the last two. This could be a bug or a feature: your data, your call. :)	[reply] [d/l] [select]
Re: There has to be an easier way... by ar0n (Priest) on Aug 16, 2000 at 01:02 UTC
`my @fields = split /\\|/, $_; my($clliA,$clliB) = @fields[14,15];` [download] update: sorry, didn't mean to be redundant. my fellow monks are so fast! please ignore me. -- ar0n \|\| Just Another Perl Joe	[reply] [d/l]
Re: There has to be an easier way... by ferrency (Deacon) on Aug 16, 2000 at 01:06 UTC
You probably want to be using `split`. This takes a string and splits it into an array of strings, breaking it up on a specified delimiter. For example: `my $data = "f1\|f2\|f3\|f4\|f5"; my @data = split /\|/, $data; print $data[2], "\n"; # prints f3, of course- arrays start on 0.` [download] Or, using array slices: `my $data = "f1\|f2\|f3\|f4\|f5"; my ($clliA, $clliZ) = (split /\|/, $d)[1,3]; # $clliA = "f2" # $clliZ = "f4"` [download] Fun fun fun... Alan update: Yikes, everyone beat me to it... oops.	[reply] [d/l] [select]
RE (tilly) 1: There has to be an easier way... by tilly (Archbishop) on Aug 17, 2000 at 03:47 UTC
Everyone else has said what the preferred solution is. But nobody has explained why what you did was so slow. Full details are in Mastering Regular Expressions. However the basic theory is that Perl does a recursive search for ways to try to match your pattern to the string. The match goes from left to right in the pattern and the string. So it first tries to match the first (.) to the end of the string. Well then it fails to get the pipe. So it backs off and tries again. And it turns out that you are doing a scenario where there are a lot of wrong partial matches you have to try first. If you change all of the (.)s to (.?)s then the RE would be faster. It would be safer still to change them to ([^\\|])s. Split is even faster, but as you learn REs keep in mind the principle that ambiguity in the RE can result in unexpected slowdowns... Cheers, Ben PS Style point. Split your data into data structures early and then access the data structures directly rather than using formatted strings. In the long run I have found that to be faster, safer, and simpler.	[reply]