in reply to Character Text Delimiters

Others have noted this is an inherently fragile data format. (An example, I think, of the Semipredicate problem.) See what happens when records in the test data below are swapped, or if 'AE(foo)' in record AD is changed to '(fubar),AE(foo)'. However, one possible way:

>perl -wMstrict -le "my $s = 'AA(Acme Widgets. 123 Coyote St. AZ(Ariz.),USA) ,' . 'AB(Your Name. 99 Some St. HI(Hawaii), USA), ' . 'AC(Dep deAstro. Uni de Val. C/Dr. M 50, 461 Bur (Val), Sp),' . 'AD(AE(foo), approaching breaking point AD(bar)) , ' . 'AE(optional trailing comma, spaces on last record)' ; ;; my $tag = 'AA'; my $stop = 'ZZ'; ;; EXTRACT: for (++(my $after = $tag); $tag le $stop; ++$tag, ++$after) { my $pre = qr{ \G $tag [(] }xms; my $post = qr{ [)] (?: \s* , \s* (?= $after) | \s* ,? \s* \z) }xms; ;; last EXTRACT unless $s =~ m{ $pre (.*?) $post }xmsg; my $extract = $1; print qq{'$tag': [[$extract]]}; } " 'AA': [[Acme Widgets. 123 Coyote St. AZ(Ariz.),USA]] 'AB': [[Your Name. 99 Some St. HI(Hawaii), USA]] 'AC': [[Dep deAstro. Uni de Val. C/Dr. M 50, 461 Bur (Val), Sp]] 'AD': [[AE(foo), approaching breaking point AD(bar)]] 'AE': [[optional trailing comma, spaces on last record]]

Update: Enhanced discussion, improved 'robustness' of extraction (for some definition of robust), added stress-test data records to example data.