vitoco has asked for the wisdom of the Perl Monks concerning the following question:
I want to get a list of items from a long text string with a given format. The format is pretty simple, but the number of items in the list is variable, also it is the number of lists in the same string. Of course, there are many other things in the string that must be discarded.
I tried a single regular expression to capture the items to an array, but I can get only the first or the last element or each identified list...
This is a test code:
#!perl use strict; use warnings; while (<DATA>) { chomp; s!\s+! !g; my $txt = $_; print "$_\n"; my @items = (); print "FOUND: @items\n" if (@items = ($txt =~ m!\btest \w+(?:(?: is) +? \w+)?(?: ?, ?(\w+)(?:(?: is)? \w+)?)+!ig)); } __DATA__ this line has nothing, nothing, nothing... 1 , 2, 3, 4 is four, 5, 6 test 00,11 is one,22, 33 is three,44,55 is + the best, and this is not a test 111, 222, 333 as random words to + finish this should be a test, but nothing must be returned 4444, 7777, 9999 i +s garbage
In this example, the lists starts with the string "test", the elements are delimited by a comma, each element could be followed by an optional "is" and another word (must be discarded), and the first element of the list is not important and must be ignored. The given data has 3 lines, and only the 2nd one has two lists, the 1st and 3rd have none. The expected result is:
FOUND: 11 22 33 44 55 222 333
What I got is:
FOUND: 55 333
If I remove the last plus sign, I get:
FOUND: 11 222
If I remove the "g" modifier, I get only one list (with one item):
FOUND: 55
What am I missing?
Thanks!!!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parse for a list in a long string
by choroba (Cardinal) on Jun 02, 2015 at 17:17 UTC | |
|
Re: Parse for a list in a long string
by AnomalousMonk (Archbishop) on Jun 02, 2015 at 17:52 UTC | |
|
Re: Parse for a list in a long string
by Anonymous Monk on Jun 02, 2015 at 19:35 UTC | |
|
Re: Parse for a list in a long string
by vitoco (Hermit) on Jun 02, 2015 at 20:39 UTC |