in reply to Re: pattern matching once
in thread pattern matching once

Ok, this is surprising.

It either stopped matching completely if I put chomp before it or added the \b, or added \ to the end of it when it did match.

e.g. a link came out like this

https://www.sec.gov/Archives/edgar/data/831001/000095010323011811/dp198116_424b2-us2343462.htm\

Replies are listed 'Best First'.
Re^3: pattern matching once
by eyepopslikeamosquito (Archbishop) on Aug 11, 2023 at 14:24 UTC

    To help us communicate without going around in circles, please read and try to follow:

    Here is my test program t1.pl:

    use strict; use warnings; # Small standalone test program derived from [id://11153804] # @lines contains some test lines derived from: # https://www.sec.gov/Archives/edgar/data/831001/000095010323011632/00 +00950103-23-011632.txt my @lines = ( '<FILENAME>dp198076_424b2-us2342673.htm', '<FILENAME>dp198076_exfilingfees.htm', '<FILENAME>dp198076_oopswrongextension.htmfred', ); foreach my $line (@lines) { print "line:$line:\n"; if ($line =~ m/<FILENAME>(.*\.htm)/) { print " matched line: filename='$1'\n"; } else { print " did not match line\n"; } }

    When I run this program, this is what I see:

    line:<FILENAME>dp198076_424b2-us2342673.htm: matched line: filename='dp198076_424b2-us2342673.htm' line:<FILENAME>dp198076_exfilingfees.htm: matched line: filename='dp198076_exfilingfees.htm' line:<FILENAME>dp198076_oopswrongextension.htmfred: matched line: filename='dp198076_oopswrongextension.htm'

    When I change this line above from:

    if ($line =~ m/<FILENAME>(.*\.htm)/) {
    to:
    if ($line =~ m/<FILENAME>(.*\.htm)\b/) {

    when I run this program I see instead:

    line:<FILENAME>dp198076_424b2-us2342673.htm: matched line: filename='dp198076_424b2-us2342673.htm' line:<FILENAME>dp198076_exfilingfees.htm: matched line: filename='dp198076_exfilingfees.htm' line:<FILENAME>dp198076_oopswrongextension.htmfred: did not match line

    If that is not what you were asking about, please clarify your question by posting a Short, Self-Contained, Correct Example that we can run.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^3: pattern matching once
by Marshall (Canon) on Aug 11, 2023 at 16:50 UTC
    you probably put the \b within the capture group.
    use strict; use warnings; my @lines = ("https://www.sec.gov/Archives/edgar/data/831001/000095010 +323011632/0000950103-23-011632.txt\n", "<FILENAME>dp198076_424b2-us2342673.htmSomeCrap\n", "<FILENAME>dp198076_424b2-us2342673.htm\n", "<FILENAME>dp198076_exfilingfees.htm\n", ); foreach my $line (@lines) { if (my ($doc_title) = $line=~ m/<FILENAME>(.*\.htm)\b/) { print "Filename is $doc_title\n"; last; ############### this stops } } __END__ Filename is dp198076_424b2-us2342673.htm

      You were not that far off...

      The file is actually this:

      "<FILENAME>dp198076_424b2-us2342673.htm \n", "<FILENAME>dp198076_exfilingfees.htm\n",

      with a space after the .htm in some cases but not others so the /b didn't work all the time

        You will have to show some runnable code where the \b fails. Both of your example lines work fine in my example code.

        \b means approximately "word boundary". Any white space character (space or \n or other such character like \t) satisfies that boundary condition. End of the string also satisfies that boundary condition (i.e. having no character following ".htm").

        What do you mean by " so the /b didn't work all the time"?

        Look carefully and make sure that there is no space before the \b in:
        if (my ($doc_title) = $line=~ m/<FILENAME>(.*\.htm)\b/) {