in reply to Need RegExp help - doing an AND match

This works okay, though how it fairs performance wise compared with other methods I'm not sure.

#! perl -slw use strict; sub reAnd{ my $re = ''; $re .= '(?=^.*\b' . quotemeta() . '\b)' for @_; return qr[$re]; } my @words = qw[ an of and ]; my $re1 = reAnd( @words ); #print $re1; my $re2 = reAnd( qw[ a great sweet mother by the wellfed voice beside +him ] ); #print $re2; while( <DATA> ) { m[$re1]i and print "1:$_"; m[$re2] and print "2:$_"; } __DATA__ Stephen, an elbow rested on the jagged granite, leaned his palm agains +t his brow and gazed at the fraying edge of his shiny black coat-sleeve. Pain, that was not yet the pain of love, fretted his heart. Silently, in a dream she had come to him after her death, her wasted body within its loose brown graveclothes giving off an odour of wax and rosewood, her breath, that had bent upon him, mute, reproachful, a faint odour o +f wetted ashes. Across the threadbare cuffedge he saw the sea hailed as a great sweet mother by the wellfed voice beside him. The ring of bay and skyline held a dull green mass of liquid. A bowl of white china ha +d stood beside her deathbed holding the green sluggish bile which she ha +d torn up from her rotting liver by fits of loud groaning vomiting.

Prints

C:\test>624296.pl 1:its loose brown graveclothes giving off an odour of wax and rosewood +, 2:a great sweet mother by the wellfed voice beside him. The ring of ba +y

The basic mechanism is to use regex of the form (?=^.*\bword\b). That is, a positive lookahead assertion that reads: Starting at the begining of the line, skip as much of anything as need to try and locate the word 'word', delimited by word/nonword transitions. (\b).

As these are zero length assertions, they do not advance the matchpoint, so adding a second one again starts from the beginning of the string. This gives the ability to match any number of words in any order. If they all match, the regex succeeds and the AND operation is achieved.

By generating the regex in a sub, the 'horrors' of the 'bunch of regex' can be hidden from the squeamish.

Add /i to the use of the generated regex if you need case independant matching.

If you omit the ^, then the lookaheads will continue from the current pos, and so you can append the AND operation to longer regex. However, continuing to match after the successful match is more involved.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Need RegExp help - doing an AND match
by Anonymous Monk on Jul 01, 2007 at 19:56 UTC
    Ok, thanks everyone for the help!

    RegExp was actually fastest in the end overall and obviously depends on the input.

    Unfortunately I ran into a problem - I don't know how to handle regular expressions actually containing the character '.'

    For example, I may wish to search for a file extension.

      See perlre.

      /./ will match the character '.', but will also match many other characters. What you likely want is to quote the dot, so it loses its special meaning:

      "." =~ m!\.!

      Update: GrandFather spotted a missing ! at the end of the regular expression

        Yes, I know what you mean but I'm not matching explicit text - here is an example:
        @words = ("filename" , "file.txt"); $line = "my file is file.txt";
        I want to search this line and save it in a variable for later. These work assuming no lines contain both words:
        $file = (grep{$line =~ /$_/i} @words)[0]; or foreach (@words) { next if($line !~ /$_/i); $file = $_; }
        So both work ($file is nonempty) but not good.
        Problem is that I will need to pass a variable into my RegExp for my search - which may or may not contain a "."

        Apologies all. I've re-read all posts here and List::Util first does the trick. Thanks to all who replied!
Re^2: Need RegExp help - doing an AND match
by john_oshea (Priest) on Jul 02, 2007 at 12:01 UTC

    You have to appreciate a site where James Joyce writes your sample data... ;-)

    BrowserUK++