Others have noted this is an inherently fragile data format. (An example, I think, of the Semipredicate problem.) See what happens when records in the test data below are swapped, or if 'AE(foo)' in record AD is changed to '(fubar),AE(foo)'. However, one possible way:

>perl -wMstrict -le "my $s = 'AA(Acme Widgets. 123 Coyote St. AZ(Ariz.),USA) ,' . 'AB(Your Name. 99 Some St. HI(Hawaii), USA), ' . 'AC(Dep deAstro. Uni de Val. C/Dr. M 50, 461 Bur (Val), Sp),' . 'AD(AE(foo), approaching breaking point AD(bar)) , ' . 'AE(optional trailing comma, spaces on last record)' ; ;; my $tag = 'AA'; my $stop = 'ZZ'; ;; EXTRACT: for (++(my $after = $tag); $tag le $stop; ++$tag, ++$after) { my $pre = qr{ \G $tag [(] }xms; my $post = qr{ [)] (?: \s* , \s* (?= $after) | \s* ,? \s* \z) }xms; ;; last EXTRACT unless $s =~ m{ $pre (.*?) $post }xmsg; my $extract = $1; print qq{'$tag': [[$extract]]}; } " 'AA': [[Acme Widgets. 123 Coyote St. AZ(Ariz.),USA]] 'AB': [[Your Name. 99 Some St. HI(Hawaii), USA]] 'AC': [[Dep deAstro. Uni de Val. C/Dr. M 50, 461 Bur (Val), Sp]] 'AD': [[AE(foo), approaching breaking point AD(bar)]] 'AE': [[optional trailing comma, spaces on last record]]

Update: Enhanced discussion, improved 'robustness' of extraction (for some definition of robust), added stress-test data records to example data.


In reply to Re: Character Text Delimiters by AnomalousMonk
in thread Character Text Delimiters by Ninthwave

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.