A line of code matches the question

*2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: A line of code matches the question
by SuicideJunkie (Vicar) on Aug 10, 2017 at 15:07 UTC

Presumably you mean a "regular expression"?

The most basic thing is you probably want multiline mode with a /m

And odds are you're going to want an XML parser to make your life easier when someone changes minor details in the file. I recommend XML::Tiny which great for most use cases and has no dependencies.

[reply]

Re^2: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 15:48 UTC

<div id = "724">

[reply]
[d/l]

Re^3: A line of code matches the question

by Anonymous Monk on Aug 10, 2017 at 16:25 UTC

You should NOT USE REGULAR EXPRESSIONS to parse HTML. Just don't do it. If you have a good-enough solution that uses two regular expressions, you should NOT try to combine it into one regular expression. Regexes are bad at parsing structured data like HTML, and rapidly become incomprehensible and unmaintainable when you try.

[reply]

Re^4: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 16:48 UTC

Re: A line of code matches the question
by Athanasius (Archbishop) on Aug 10, 2017 at 16:02 UTC

Hello *2, and welcome to the Monastery!

First, please note that the /g modifier on the first regex (the one in the if statement) does nothing, because the regex is called only once, in scalar context. If there were two or more <div id="724"> elements, only the first would be printed. You can fix this easily by changing the if into a while loop:

while ($t =~ /<div id="724">(.*?)<\/div>/sg)
{
    print "$_\n" for $1 =~ /<p>(.+?)<\/p>/g;
}
[download]

However, as SuicideJunkie says, you’ll be much better off using a dedicated XML parser. But note that your XML is not well-formed, because the <meta charset="UTF-8"> tag has no corresponding closing tag. When this is fixed, parsing is straightforward:

use strict;
use warnings;
use XML::LibXML;

my $t = <<'EOF';
...
<meta charset="UTF-8" />
...
EOF

my $dom = XML::LibXML->load_xml(string => $t);

print $_->to_literal . "\n" for $dom->findnodes('//div[@id="724"]/p');
[download]

Output:

 1:59 >perl 1798_SoPW.pl
aaa22
22
22
aaa22
aaa22
aafsdfsdfa22

 1:59 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 17:16 UTC

I have just done some testing, I found that XML :: LibXML is too concerned about the HTML format is correct, it does not seem to allow me to make a mistake. I found it was not quite suitable for doing this thing, and maybe the regular expression was more suitable for my current job. :)

[reply]

Re^2: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 16:55 UTC

Wow, XML :: LibXML too strong! It solved my problem, the other of your careful worthy of my learning!

[reply]

Re: A line of code matches the question
by hippo (Archbishop) on Aug 10, 2017 at 16:01 UTC

When I run your code as it stands I get this:

$ perl 1197154.pl 
aaa22
22
22
aaa22
aaa22
aafsdfsdfa22
[download]

How does this output differ from what you expect/want?

If you want the same output but by some other means, then you would need to be a lot more specific about what those other means might be.

[reply]
[d/l]

Re^2: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 17:01 UTC

This is no problem, I'm sure. If you have other methods, expect you to reply again. I like different ways to solve the same problem! Thanks again for the monks!

[reply]

Re: A line of code matches the question
by *2 (Novice) on Aug 10, 2017 at 17:49 UTC

Athanasius

use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;

my $t = <<'EOF';
<!DOCTYPE html>
<html>
<head lang="en">
    <meta charset="UTF-8">
    <title></title>
    <script src="CssScriptLoader.js"></script>
    <script src="XZClass.js"></script>
</head>
<div id="224">
<p>aaa</p>
<p>aaa</p>
<p>axxxdsfosdaa</p>
<p>aaa</p>
</div>
<div id="724">
    <p>aaa22</p>
<p>22</p>
<p>22</p>
<p>aaa22</p>
<p>aaa22</p>
<p>aafsdfsdfa22</p>
</div>
<div id="284">
    <p>aaa33</p>
<p>aaa33</p>
<p>aaa33sdfsdfaom</p>
<p>aaa33</p>
<p>aaa33</p>
<p>aaa33</p>
</div>
</html>
EOF

my $dom = XML::LibXML->load_html(
    string  => \$t,
    recover   => 1,
    suppress_errors => 1,
);

my $xpath = '//div[@id="724"]/p';
print "$_\n" foreach $dom->findnodes($xpath)->to_literal_list;
[download]

[reply]
[d/l]

Re: A line of code matches the question
by Anonymous Monk on Aug 10, 2017 at 14:24 UTC

How to use a line of regular code to get this matter?

[reply]

Re^2: A line of code matches the question

by *2 (Novice) on Aug 10, 2017 at 15:49 UTC

Sorry, my mother tongue is not English, so it's hard to describe it. I would like to try to make the problem clear.

[reply]