one line regular expression

Thalamus has asked for the wisdom of the Perl Monks concerning the following question:

Hi all ! First time posting ... so - please be gentle :)

I want to delete a section of text from a *.html file which contains this text. So, I have to treat it as a multi-line or whatever it is called.



<ADDRESS>
someone@some.domain.co.uk
</ADDRESS>
</BODY>
</HTML>

I feel I've tried everything ... -but obviously not, since I haven't figured it out yet. I want to take away the section between the start and end of the <ADDRESS> tag. If I try to take out only the <ADDRESS> it works - the regular expression for the email is also working (on their own), but once I try to do both at the same time I fail missearbly.

perl -i.bak -ne 'if(s!<ADDRESS>.(\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3})!!mgis) {next;} print;' index.html

Comment on one line regular expression - help needed Download Code

Replies are listed 'Best First'.
Re: one line regular expression - help needed by Corion (Patriarch) on Jul 29, 2010 at 08:18 UTC
Hello and welcome! I think your "problem" is that `<ADDRESS>` and `</ADDRESS>` are on different lines, and `-n` goes through your file line by line. Conveniently, Perl can work with line-oriented stuff quite well: `perl -i.bak -ne "print unless /<ADDRESS>/ .. m!</ADDRESS>!"` [download] This approach only works if there is only one `<ADDRESS>` sequence and you want to remove that. Your regular expression does not seem to allow for (much) whitespace between `<ADDRESS>` and the email address starting. If you change the following dot to `\s*`, you will have more success in matching. You still need to slurp the whole input at once. I think `-0777` will activate slurp mode.	[reply] [d/l] [select]
Re^2: one line regular expression - help needed by morgon (Priest) on Jul 29, 2010 at 18:03 UTC
In case there are several ADDRESS-sections and you want to get rid of them all in one go you can do it like this: `perl -i.bak -0777 -pe 's\|<ADDRESS>.*?</ADDRESS>\|\|gs' <your file>` [download]	[reply] [d/l]
Re^2: one line regular expression - help needed by Thalamus (Acolyte) on Jul 29, 2010 at 08:46 UTC
Thanks for the response guys. You saved my day.	[reply]
Re: one line regular expression - help needed by marto (Cardinal) on Jul 29, 2010 at 08:27 UTC
Welcome to the Monastery. I'd advise not using a regex to manipulate a HTML/XML, rather using one of the parser modules available. For example read HTML::TokeParser Tutorial.	[reply]
Re^2: one line regular expression - help needed by Thalamus (Acolyte) on Jul 29, 2010 at 08:47 UTC
Thanks ... will have a look at it.	[reply]