in reply to Need a Regular Expression that tests for words in different order and captures the values found.

open my $file_fh, '<', $file or die "unable to open $file for read : $ +!"; my $pattern = qr/ (?= .* fred \s+ (\w+) ) (?= .* barney \s+ (\w+) ) (?= .* joe \s+ (\w+) ) /ix; while ( my $line = <$file_fh> ) { if ( $line =~ /$pattern/ ) { $company = join '_', $1, $2, $3, 'inc';
  • Comment on Re: Need a Regular Expression that tests for words in different order and captures the values found.
  • Download Code

Replies are listed 'Best First'.
Re^2: Need a Regular Expression that tests for words in different order and captures the values found.
by ikegami (Patriarch) on Jan 15, 2010 at 08:06 UTC
    You need to add \b or something in front of fred, barney and joe. You're not suppose to be matching alfred.
      Thanks,
      What would I do in case of hyphens? where, I had a line that contained:
      "pseudo-fred flintstone" and I wanted to skip it because this wasn't the real fred that I was keying on.
      $line =~ /(?=.*\bfred\s+(\w+)/ ; # would get "fred" and anything "-fred" # how would I avoid that?
        What would I do in case of hyphens?

        You need to figure out a boundary condition, then define it as a regex.

        For instance, in the case of  (?=.*\bfred\s+(\w+)) (note the closing parenthesis, missing in your reply), the boundary condition is the  \b word boundary assertion.
        It might be defined
            my $boundary = qr{ \b }xms;
        and used (without the //x modifier) (untested)
            (?=.*${boundary}fred\s+(\w+))

        Now just change the definition of  $boundary to fit your needs. For instance, if you wanted first names to follow anything that was not a word character and also not a hyphen, you might define (untested)
            my $boundary = qr{ (?<! [\w-]) }xms;

Re^2: Need a Regular Expression that tests for words in different order and captures the values found.
by greatwazzoo (Novice) on Jan 15, 2010 at 14:36 UTC
    Great thanks, I was able to get it to work(sortof) with this line:
    $line =~ /^(?=.*fred\s+(\w+))(?=.*barney\s+(\w+))(?=.*joe\s+(\w+))/ ; $company = join '_', $1, $2, $3, 'inc';
      ... I was able to get it to work(sortof) ...

      This implies the solution does not entirely fulfill your needs. In what way does it fall short?

        it goes above and beyond in filling my needs
        the "sortof" was meant that, the script that I'm bashing out isn't
        actually looking for flintstones or rubbles, but parsing settings
        from a config file, where the names of the settings arent in a
        consistent order
        e.g. I want "one two three" but most of the time all three are there
        just in a random order
        There was an article in The Perl Journal(I have the Oreilly book)
        by Jeffrey Friedl that mentioned the
        /^(?=.*one)(?=.*two)/
        solution, but I didnt know how(or that you could) capture, inside
        non consuming parenthesis.

      Heh, I just replied with this very solution (see below). Put a question mark after each closing parenthesis and it should work.

      $line =~ /^(?=.*fred\s+(\w+))?(?=.*barney\s+(\w+))?(?=.*joe\s+(\w+))?/ + ; $company = join '_', $1, $2, $3, 'inc';
      --marmot