As multiple monks smarter than I have stated, you should really be using one of the many technologies widely and freely available to parse XML. Really. No, really. That having been said, assuming there is some good reason to do this that escapes me and my brethren, the following regex will return the first warning child element of a para0 element. I've included all of your listed child tags and those you mentioned in your post.

/<para0[^>]*?> (?: \s* (?: <title .*?<\/title> |<para .*?<\/para> |<applic .*?<\/applic> |<capgrp .*?<\/capgrp> |<subpara1 .*?<\/subpara1> |<caution .*?<\/caution> ) )*? \s* (<warning .*? <\/warning>) (?: \s* (?: <title .*? <\/title> |<para .*? <\/para> |<applic .*? <\/applic> |<capgrp .*? <\/capgrp> |<subpara1 .*? <\/subpara1> |<caution .*? <\/caution> |<warning .*? <\/warning> ) )*? \s* <\/para0> /sx

The code works as follows:

  1. Find an element starting with <para0>, which may have attributes
  2. Non-grouping match any number of title, para, applic, capgrp, subpara1, or caution tags
  3. Match and capture your warning tag
  4. Non-grouping match any number of title, para, applic, capgrp, subpara1, caution or warning tags
  5. Close the search with the closing </para> tag

Note that your entire XML must be in a single string (not an array) and should be executed with the /s modifier.

Update: Some additional notes. This assumes that your XML is well-formed. It assumes you have at least one warning element in your para0 element. And most importantly, if there are 1st generation tags which are not accounted for, .*? can jump to unexpected locations, meaning you will not get what you expected. Compare that to ikegami's solution, which will just work.

Update 2: At shmem's suggestion, I added an /x modifier and reformatted the regex to make it (maybe) easier to follow.


In reply to Re: regex - need first child of parent by kennethk
in thread regex - need first child of parent by kdolan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.