I want to get a list of items from a long text string with a given format. The format is pretty simple, but the number of items in the list is variable, also it is the number of lists in the same string. Of course, there are many other things in the string that must be discarded.

I tried a single regular expression to capture the items to an array, but I can get only the first or the last element or each identified list...

This is a test code:

#!perl use strict; use warnings; while (<DATA>) { chomp; s!\s+! !g; my $txt = $_; print "$_\n"; my @items = (); print "FOUND: @items\n" if (@items = ($txt =~ m!\btest \w+(?:(?: is) +? \w+)?(?: ?, ?(\w+)(?:(?: is)? \w+)?)+!ig)); } __DATA__ this line has nothing, nothing, nothing... 1 , 2, 3, 4 is four, 5, 6 test 00,11 is one,22, 33 is three,44,55 is + the best, and this is not a test 111, 222, 333 as random words to + finish this should be a test, but nothing must be returned 4444, 7777, 9999 i +s garbage

In this example, the lists starts with the string "test", the elements are delimited by a comma, each element could be followed by an optional "is" and another word (must be discarded), and the first element of the list is not important and must be ignored. The given data has 3 lines, and only the 2nd one has two lists, the 1st and 3rd have none. The expected result is:

FOUND: 11 22 33 44 55 222 333

What I got is:

FOUND: 55 333

If I remove the last plus sign, I get:

FOUND: 11 222

If I remove the "g" modifier, I get only one list (with one item):

FOUND: 55

What am I missing?

Thanks!!!


In reply to Parse for a list in a long string by vitoco

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.