Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Dear Perl-Monks,
a month ago I didn't even know that there is a language called Perl (except for some dubious notes from fellow people that shared obscufated code snipplets for the enjoyment of us from Java-island).
Anyway... right now I'm in the middle of it, slowly finding my way around. But right now, there is a riddle I can't solve.
First: I heard of the module I should use to parse html-files, and I might use it n future, but right now - as the title suggests - I have a regex, that snatch stuff out of a html-file. And for reasons I can't determine, it just find the very last possible match.
So... the html-file does look like this:
<TR> <TD></TD> <TD CLASS='statusOdd'><TABLE BORDER=0 WIDTH='100%' CELLSPACING=0 CELLP +ADDING=0><TR><TD ALIGN=LEFT><TABLE BORDER=0 CELLSPACING=0 CELLPADDING +=0> <TR> <TD ALIGN=LEFT valign=center CLASS='statusOdd'><A HREF='extinfo.cgi?re +questWithPrivateInformations'>Description</A></TD></TR> </TABLE> </TD> <TD ALIGN=RIGHT CLASS='statusOdd'> <TABLE BORDER=0 cellspacing=0 cellpadding=0> <TR> </TR> </TABLE> </TD> </TR></TABLE></TD> <TD CLASS='statusOK'>OK</TD> <TD CLASS='statusOdd' nowrap>2015-05-17 01:59:48</TD> <TD CLASS='statusOdd' nowrap>145d 19h 53m 11s</TD> <TD CLASS='statusOdd'>1/4</TD> <TD CLASS='statusOdd' valign='center'>something that shoudn't be publi +shed in the Internet;</TD> </TR>
Beware, that the table that shines though this snipplet contains a lot of similiar tag-constructs (about 50 table-lines, which I all want in that array you see in the next code-segment). Some of the readers might have heard of Nagios (well, actually I think most might use it too), and yes, thats a state-information. I made the information obscure to protect the affected customers dates, so don't ponder about the nonsense you see there.
and thats the Regex I use to get pieces of information out of it:
my @superContainer; while($longline =~ /<TD ALIGN=LEFT.+extinfo.cgi.+'>(.+)<\/A.+'status.+ +>(.+)<\/TD.+nowrap>(.+)<\/TD.+nowrap>(.+s)<\/TD>.+'>(\d\/\d)<\/TD>.+' +>(.+)<\/TD>*?/g){ my @subContainer = ($1, $2, $3, $4, $5, $6); push @superContainer, \@subContainer }
Ah... no line-recognition? That's okay, because (you don't see this) I placed the whole .html file in a single line string to avoid... happenings that might happen if you have line-terminations in your source. The whole html-file - one $longline.
used with the whole file that Nagios send over, I will get exaclty ONE match, which is the very last occurence. I tried this with something that looked like "aaaa bbbb ccc aa bbbb cc dd aaa cc dd" and used a similiar RegEx to snatch all... well, in short, this: /(a+).+(a+)/ And it did what I expected: providing $1 ... $n with the As inside, even for multible possible matches.
Hm... one thing - I added /regex/gc to my construct, but that didn't do anything. In fact, most of the answers the internet had (*? at the and, or the beginning, some letters behind //, more and MORE brackets) didn't change my result: only the very last match will be recognized.
In the debugger, i will get something like this at the end when I ask for x @superContainer:
but where are the other 49 matches? *cry*0 ARRAY(0x1e2b628) 0 'Description' 1 'OK' 2 '2015-05-17 01:59:48' 3 '145d 19h 53m 11s' 4 '1/4' 5 'something that shoudn't be published in the Internet;'
Can someone please provide this humble person with an explanation, how to tell the RegEx to find ALL of these matches that are in this file?
Greetings someone that might register himself in near future
|
|---|