in reply to Re^2: pattern matching once
in thread pattern matching once

you probably put the \b within the capture group.
use strict; use warnings; my @lines = ("https://www.sec.gov/Archives/edgar/data/831001/000095010 +323011632/0000950103-23-011632.txt\n", "<FILENAME>dp198076_424b2-us2342673.htmSomeCrap\n", "<FILENAME>dp198076_424b2-us2342673.htm\n", "<FILENAME>dp198076_exfilingfees.htm\n", ); foreach my $line (@lines) { if (my ($doc_title) = $line=~ m/<FILENAME>(.*\.htm)\b/) { print "Filename is $doc_title\n"; last; ############### this stops } } __END__ Filename is dp198076_424b2-us2342673.htm

Replies are listed 'Best First'.
Re^4: pattern matching once
by justin423 (Scribe) on Aug 11, 2023 at 16:58 UTC

    You were not that far off...

    The file is actually this:

    "<FILENAME>dp198076_424b2-us2342673.htm \n", "<FILENAME>dp198076_exfilingfees.htm\n",

    with a space after the .htm in some cases but not others so the /b didn't work all the time

      You will have to show some runnable code where the \b fails. Both of your example lines work fine in my example code.

      \b means approximately "word boundary". Any white space character (space or \n or other such character like \t) satisfies that boundary condition. End of the string also satisfies that boundary condition (i.e. having no character following ".htm").

      What do you mean by " so the /b didn't work all the time"?

      Look carefully and make sure that there is no space before the \b in:
      if (my ($doc_title) = $line=~ m/<FILENAME>(.*\.htm)\b/) {

        An HTML space &nbsp.