Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Parsing HTML/XML with Regular Expressions

by Grimy (Pilgrim)
on Oct 17, 2017 at 16:32 UTC ( [id://1201514]=note: print w/replies, xml ) Need Help??

Help for this page

Select Code to Download


  1. or download this
    #!/usr/bin/perl -p0
    s/<!([^<>]|<(?1)*>)*>//gs;
    ...
    s/.*?<(?:[^'"]|(['"]).*?\1)*?\bid\s*=\s*(['"])(.*?)\2.*?>([^<]*)/$3=$4
    +, /gs;
    s/&#(\w+);/chr $1/ge;
    s/[^\w=, ]|, $|(.)\1\1//g;
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1201514]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found