Re^4: pattern matching once

in reply to Re^3: pattern matching once
in thread pattern matching once

You were not that far off...

The file is actually this:

"<FILENAME>dp198076_424b2-us2342673.htm \n", "<FILENAME>dp198076_exfilingfees.htm\n",

with a space after the .htm in some cases but not others so the /b didn't work all the time

Comment on Re^4: pattern matching once

Replies are listed 'Best First'.
Re^5: pattern matching once by Marshall (Canon) on Aug 11, 2023 at 19:45 UTC
You will have to show some runnable code where the \b fails. Both of your example lines work fine in my example code. \b means approximately "word boundary". Any white space character (space or \n or other such character like \t) satisfies that boundary condition. End of the string also satisfies that boundary condition (i.e. having no character following ".htm"). What do you mean by " so the /b didn't work all the time"? Look carefully and make sure that there is no space before the \b in: `if (my ($doc_title) = $line=~ m/<FILENAME>(.*\.htm)\b/) {`	[reply] [d/l]
Re^6: pattern matching once by justin423 (Scribe) on Aug 12, 2023 at 03:08 UTC
An HTML space `&nbsp.`	[reply] [d/l]
Re^7: pattern matching once by Marshall (Canon) on Aug 12, 2023 at 07:23 UTC
That still works; `"<FILENAME>dp198076_424b2-us2342673.htm&nbsp\n",` That is because \b is a word to non-word boundary. & is not a word character. Word characters are the ones that you can use in a Perl variable name. `[a-zA-Z0-9_]` So we are back at the same problem, you say that there is a problem, but refuse to show any actual code. If you are actually parsing an HTML doc, you should be using one of the HTML decoder modules before trying to use regex. I believe that haukex has posted some links on that subject. I think you are well advised to read his post in detail.	[reply] [d/l] [select]

In Section Seekers of Perl Wisdom