One line assigment statement with regex match

ketema has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: One line assigment statement with regex match by rev_1318 (Chaplain) on Jun 22, 2005 at 21:41 UTC
If the words in `@terms` are litterals, you could try: `($match) = grep { $lineFromSomeFile =~ $_ } @terms;` [download] Paul	[reply] [d/l] [select]
Re^2: One line assigment statement with regex match by tlm (Prior) on Jun 23, 2005 at 00:31 UTC
When looking for short literals, `index` is more efficient than a regex: `my @matches = grep index( $lineFromSomeFile, $_ ) > -1, @terms;` [download] Update: Added the link and the qualifier "short" in response to kaif's comment++. How short is short? When I tested random (but constant) strings and substrings of lengths 80 and 8, respectively, which are "typical" lengths for a line and a word, `index` was about 20% faster than the corresponding regex. I imagine that it is this sort of analysis that's responsible for the widespread reputation of `index` as being superior to regexes. Clearly, as kaif shows, the ratio of speeds is sensitive to the sizes of the string and the substring being searched, but I have not done a detailed analysis beyond this, and what is posted in the node linked above. the lowliest monk	[reply] [d/l]
Re^3: One line assigment statement with regex match by kaif (Friar) on Jun 23, 2005 at 01:59 UTC
So, a lot of people like to say that. And indeed, sometimes `index` is ten times faster. But sometimes it's more than three times slower! use Benchmark qw(:all); $text = <<EOF; aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +aaaaaaaaaa EOF $pattern = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"; cmpthese($count, { 'regex' => sub { $text =~ $pattern }, 'index' => sub { index $text, $pattern }, }); __DATA__ Rate index regex index 630601/s -- -67% regex 1914815/s 204% -- [download] Moreover, increasing the lengths of the text and pattern, I can make the regex be 40 times faster. See Re^8: "advanced" Perl functions and maintainability for reasons why people use regexes instead of `index`. Personally, I still don't understand why there even is a difference in speed -- shouldn't the regex engine be optimized to notice that this is a search for a constant string and then call the same function as `index`? : No, I'm not kidding. The output follows. Moreover, for this example, adding a single `study $text` is an extra 10 times faster, completely obliterating `index`. `Rate index regex study index 178/s -- -98% -100% regex 7538/s 4124% -- -92% study 98871/s 55311% 1212% --` [download] Update: I'm running perl v5.8.5 built for i686-linux.	[reply] [d/l] [select]
Re^2: One line assigment statement with regex match by ketema (Scribe) on Jun 22, 2005 at 22:04 UTC
This statement works fine. Thank You.	[reply]
Re: One line assigment statement with regex match by Codon (Friar) on Jun 22, 2005 at 21:25 UTC
Do you mean something like: `my @terms = qw(one two three); @matches = grep { my $line = $_; grep {$line =~ /$_/} @terms } @lines_ +from_file;` [download] ? Ivan Heffner Sr. Software Engineer, DAS Lead WhitePages.com, Inc.	[reply] [d/l]
Re^2: One line assigment statement with regex match by jZed (Prior) on Jun 22, 2005 at 21:49 UTC
That way will nicely return a list of lines that have at least one of the terms in them. A variant will return a list of terms that are found in at least one line: `my @matches = grep { my $term = $_; grep{ $_=~/$term/}@lines } @terms;` [download] It's unclear from the description which of these (or something else) the OP wants.	[reply] [d/l]
Re: One line assigment statement with regex match by mugwumpjism (Hermit) on Jun 23, 2005 at 00:44 UTC
I tend to use something like this; `my $re = qr/(?:${\( join "\|", map { qr/\Q$_\E/ } @terms )})/; ... my ($match) = ($line =~ m/($re)/);` [download] This is especially important if your `@terms` contain metacharacters such as parantheses, asterix, period, etc. Also, it is usually faster because you're only doing one regular expression match per line, not many. $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n"; [download]	[reply] [d/l] [select]
Re^2: One line assigment statement with regex match by ketema (Scribe) on Jun 23, 2005 at 15:37 UTC
I get search pattern not terminated when I try this: `my $re = qr/(?:${\( join "\|", map { qr/\Q$_\E/ } @terms )})/;` [download]	[reply] [d/l]
Re^3: One line assigment statement with regex match by mugwumpjism (Hermit) on Jul 06, 2005 at 22:07 UTC
Terribly sorry, I didn't test it. Yes, the double use of "/" to delimit regular expressions is tripping up the parser. Much better to use braces; `@terms=qw(foo bar b.az); my $re = qr{(?:${\( join "\|", map { qr{\Q$_\E} } @terms )})}; print "$re\n";` [download] The above will print "`(?-xism:(?:(?-xism:foo)\|(?-xism:bar)\|(?-xism:b\.\az)))`" $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n"; [download]	[reply] [d/l] [select]
Re: One line assigment statement with regex match by GrandFather (Saint) on Jun 22, 2005 at 21:42 UTC
something like: `@terms = ("Word1","Word2","Word3"); $match = $1 if join (" ", @terms) =~ /((?:^\| )$lineFromSomeTextFile(?: + \|$))/;` [download] Perl is Huffman encoded by design.	[reply] [d/l]
Re: One line assigment statement with regex match by TedPride (Priest) on Jun 22, 2005 at 21:43 UTC
Looking at your code, it's hard to tell which of several different things you're trying to do. Can you explain what you're trying to do and why in more detail?	[reply]
Re^2: One line assigment statement with regex match by ketema (Scribe) on Jun 22, 2005 at 21:48 UTC
I have an array of literal strings, words. as I read through a file i want to check the current line against my array of terms. if there is a match I want to assign that match to a varaible. I was wondering if I could do that in one statement.	[reply]
Re: One line assigment statement with regex match by broquaint (Abbot) on Jun 23, 2005 at 09:27 UTC
This looks like a good case for `Regex::Presuf` i.e `use Regex::PreSuf; my $lineFromSomeTextFile = "your line right here\n"; my @terms = qw/ something matching a line /; my($word) = $lineFromSomeTextFile =~ /(${\presuf(@terms)})/; # or without PreSuf my($word) = $lineFromSomeTextFile =~ do {local $" = '\|'; "(@terms)"}; print "found '$word' in: $lineFromSomeTextFile"; __output__ found 'line' in: your line right here` [download] HTH `_________ broquaint`	[reply] [d/l]
Re^2: One line assigment statement with regex match by planetscape (Chancellor) on Jun 23, 2005 at 18:05 UTC
Just because I really like grinder's Regexp::Assemble, and think such a cool module needs more exposure: `#!/usr/bin/perl -w use Regexp::Assemble; my $lineFromSomeTextFile = "your line right here\n"; my $ra = Regexp::Assemble->new->add( "something", "matching", "a", "line" ); print $ra->re . "\n"; # because the output is cool # and sometimes educational my($word) = $lineFromSomeTextFile =~ /($ra)/; print "found '$word' in: $lineFromSomeTextFile";` [download] (For more on Regexp::Assemble see: Why machine-generated solutions will never cease to amaze me.) HTH, planetscape	[reply] [d/l]


P is for Practical
	PerlMonks