in reply to A line of code matches the question

Hello *2, and welcome to the Monastery!

First, please note that the /g modifier on the first regex (the one in the if statement) does nothing, because the regex is called only once, in scalar context. If there were two or more <div id="724"> elements, only the first would be printed. You can fix this easily by changing the if into a while loop:

while ($t =~ /<div id="724">(.*?)<\/div>/sg) { print "$_\n" for $1 =~ /<p>(.+?)<\/p>/g; }

However, as SuicideJunkie says, you’ll be much better off using a dedicated XML parser. But note that your XML is not well-formed, because the <meta charset="UTF-8"> tag has no corresponding closing tag. When this is fixed, parsing is straightforward:

use strict; use warnings; use XML::LibXML; my $t = <<'EOF'; ... <meta charset="UTF-8" /> ... EOF my $dom = XML::LibXML->load_xml(string => $t); print $_->to_literal . "\n" for $dom->findnodes('//div[@id="724"]/p');

Output:

1:59 >perl 1798_SoPW.pl aaa22 22 22 aaa22 aaa22 aafsdfsdfa22 1:59 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: A line of code matches the question
by *2 (Novice) on Aug 10, 2017 at 17:16 UTC
    I have just done some testing, I found that XML :: LibXML is too concerned about the HTML format is correct, it does not seem to allow me to make a mistake. I found it was not quite suitable for doing this thing, and maybe the regular expression was more suitable for my current job. :)
Re^2: A line of code matches the question
by *2 (Novice) on Aug 10, 2017 at 16:55 UTC
    Wow, XML :: LibXML too strong! It solved my problem, the other of your careful worthy of my learning!