in reply to Matching multiple patterns with regex

Athanasius has given the best answer so far, and that's the route you should take, but let's pretend for a moment that there wasn't a suitable module...

The regex approach needs to look for the "From:" and "Subject:" (and other field labels) only when they appear at the beginning of a line, and it can stop reading input as soon as all the desired fields are found.

my @fields = qw/From: Subject:/; # you can add more if/when you want my $field_regex = join( "|", @fields ); my @field_lines; while (<$fh>) { push( @field_lines, $_ ) if ( /^(?:$field_regex) / ); last if @field_lines == @fields; } push @field_lines, ""; print join( "\n", sort @field_lines );
Note the regex uses the initial anchor character (^) and non-capturing grouping parens. (You could just as well use the simpler capturing parens, without the "?:" -- this would add a slight bit of extra processing, but not enough to worry about here.)

(updated to include parens around the args for the first "push" call, just because I like using parens around function args)

Replies are listed 'Best First'.
Re^2: Matching multiple patterns with regex
by afoken (Chancellor) on Oct 31, 2015 at 07:45 UTC
    my @fields = qw/From: Subject:/; # you can add more if/when you want my $field_regex = join( "|", @fields );

    I would change the second line to use quotemeta:

    my @fields = qw/From: Subject:/; # you can add more if/when you want my $field_regex = join( "|", map { quotemeta($_) } @fields );

    It may be unneeded for From: and Subject:, but code may change over time, and adding quotemeta now prevents future bugs.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Actually, I wouldn't recommend quotemeta in tasks like this. It's seldom or never the case that regex-magic characters will be needed as literals in the patterns being conjoined, but even if this comes up, it can still be better to escape them explicitly as needed (when assigning to the array), and allow some strings to use regex-magic where appropriate:
      my @fields = ('From: ', 'Subject: ', 'Thread-\w+: ', 'What-if-there\'s +-a-qmark\?'); # 3rd element matches "Thread-Topic: ", "Thread-Index: ", etc.
      Obviously, in a context where strings are coming from a potentially tainted source (i.e. not from the source code itself), one must weigh the relative risk/benefit and coding-effort/ease-of-use trade-offs of prohibiting vs. allowing (and taint-checking) regex metacharacters in applications like this.