You want to capture the catalog number, but instead you're matching the anchor text, and never even looking for what comes after it.

Try this.

/Catalog Number:\s+(\w+)/

Update: (The first part of this node was posted from a smartphone, and pecking out markup and other symbols was unpleasant enough that I avoided my usual verbosity, which will now follow):

That anchors on "Catalog Number:" followed by any amount of whitespace, and then captures all contiguous "word" characters that follow, which would include alpha, numeric, and underscore. $1 would hold the catalog number in a successful match.

Anyone who mentioned you ought to parse HTML with a proper parsing module is correct though. Regexp solutions are fragile. It's strange that when we take our car to the mechanic we never say, "I want you to fix it using only a 12mm socket wrench." But people think nothing of coming for advice on parsing HTML, and in the same breath suggest that we ought to adapt our solutions to use only regular expressions, avoiding the vast array of other tools, many of which are more suitable for the task.


Dave


In reply to Re: parsing hmtl file with regex by davido
in thread parsing hmtl file with regex by PanchoAguirre

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.