The original question was to my understanding: "can I parse this HTML in a single regex". And the answer is yes! One solution is shown below. The code is a bit tedious but it is straightforward and can be understood with some methodical thinking.

However there are a lot of pitfalls with this approach. Not the least of which is that the user layout of these HTML pages can change from one day to the next. Some of these HTML parser modules are more robust in terms of being able to handle something that "didn't quite look like it did before" and there are a zillion ways that can happen. These "one-off" things like below tend to be very single purpose rather than general purpose. So there are some trade-offs that evolve things that we haven't even begun to discuss here.

Anyway, I think you have a number of excellent approaches in this thread and one of them or a derivative of it will work find for you.

#!/usr/bin/perl -w use strict; my $doc =<<FORM; <div><label>Emp ID:</label> AASDFG <br><label>Mobile Num:</label> 9999 +999999 <br><label>location:</label> India <br><label>Inservice:</labe +l>Yes </div> FORM my @pairs = ($doc =~ m~<label>\s*(.*?)\s*</label>\s*(.*?)\s*<~g); while (@pairs) { my ($field, $value) = splice(@pairs,0,2); printf "%-15s %s\n", $field, $value; } __END__ Emp ID: AASDFG Mobile Num: 9999999999 location: India Inservice: Yes

In reply to Re: Split/Match Question by Marshall
in thread Split/Match Question by esmadmin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.