You'll do well to study the references Kcott provided above. The problems reflect, in some large measure, your need to clarify your understanding of what constitutes a matchable pattern and also, in part, your confusion about escaping characters -- that's only necessary when the characters are special to Perl's regex engine.

<aside> RichardK's point is well taken... but you've asked how to do this with a regex. Revising this to utilize $1, $2, $3 is left as an exercise (and not necessarily worth the time: it's often less error-prone to write multiple regexen rather than one which is monolithic.)</aside>

So, assuming that you've worked out how to get your data into the form you've shown, the following is executable (which is strongly recommended in the Monastery's guidance on asking a question) -- NOT so that you can cargo-cult what follows, but rather, to address what seem to be some coding issues.

#!/usr/bin/perl use 5.014; # automatically invokes 'strict' and 'warnings' # 958840 my @array = <DATA>; for my $line(@array) { if ( $line =~ /<h3>(.*)<\/h3>/ ) { # Note 1 say "\n\t NAME is: $1"; } if ( $line =~ /<p class='cd1'>(.*)(?=<\/p><p>mfg)/ ) { # Note 2 say "\t Instructions 2 are: $1"; } if ( $line =~ /\d+<\/p><p>(.*)(?=<\/p>$)/ ) { # Note 3 say "\t GENERIC NAME IS: $1.\n\t" . "-" x15; # Note 4 } } __DATA__ <h3>FENTANYL 25MCG/HR PATCH, TRANSDERMAL 72 HOURS</h3> <p class='cd1'> +Restricted to NDC labeler code 50458 (Janssen) and to a maximum of te +n (10) transdermal patches per dispensing and a maximum of three (3) +dispensings of any strength in a 75-day period only. </p><p>mfg codes +:50458</p><p>DURAGESIC</p> <h3>ACETAMINOPHEN 80MG/0.8ML SUSPENSION, DROPS(FINAL DOSAGE FORM)(ML)< +/h3> <p class='cd1'>Restricted to individuals younger than 21 years o +f age for the liquid and drops only. </p><p>mfg codes:68016, 63868, 6 +3162, 49348, 46122, 36800, 00904, 00536, 00472, 00113, 00067</p><p>TR +IAMINIC FEVER REDUCER | PAIN & FEVER | NORTEMP | MAPAP INFANT | INFAN +TS' NON-ASPIRIN | INFANT'S PAIN RELIEVER | INFANT'S PAIN RELIEF | INF +ANT'S NON-ASPIRIN | ACETAMINOPHEN</p>

Note 1: The "m" and the escapes of '>' are unnecessary but when you use a (greedy) .* you need to tell the match operator when it can stop - namely, at "</h3> (where the slash in the HTML tag has to be escaped as you did at your line 10)

Note 2: The (?=... is a look-ahead (qv in the regex docs) and, again, you need to specify what data marks the end of what you're trying to capture -- eg, when it sees the </p><p>mfg code ....

Note 3: The $ tells the regex that the closing para tag its looking for should be the last set of chars before a newline.

Note 4: The tabs, newlines and repeated "-" simply help make the output more readable... for me, that is and YMMV

Note a potential hazard in the fact that the contents of $1, once captured, are retained until replaced.

Thus, it's better practice, unless you're Really, REALLY SURE that your data is absotively, posilutely regular and well-formed, it's better to test that you have a fresh capture before blindly using whatever happens to be there. That discussion is well beyond the scope of this thread.

PS -- you'd do well to include a sample of desired output... for those of us who may not know which part of your data contains the proprietary name, and which is the "GENERIC_NAME." or, in the alternative, an explanation in your narrative. A question that leaves the Monks having to wonder about the meanings of terms used by a Seeker of Perl Wisdom is apt to get answers that run wide of the mark you intended.


In reply to Re: parsing a line with $1, $2, $3 by ww
in thread parsing a line with $1, $2, $3 by kevyt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.