Erika has asked for the wisdom of the Perl Monks concerning the following question:
I've looked everywhere for an answer I can understand and haven't found one, so I'm hoping for something clear and concise--something I, as a total newbie, can comprehend.
I have inherited 1300 HTML files full of garbage. I want to strip out a lot of code and several divs.
I have been using perl -pi.bak -e "s|oldstuff|newstuff|g" *.html to remove small parts and lines of code with limited success--I keep running into ugly character strings that are difficult to delete and require extensive backslashing.
I can deal with the painstaking removal method above but have run into a situation where it won't work. I want to delete a complete div, but the div ends non-uniquely with just <div> on a line by itself. I don't want to remove all the closing div tags from the files, so I'm not sure what to do at this point.
The lines appear on the same line number in each file, but I'm not sure whether this will be the case later on with other divs that may need to be removed--the data in the files is similar, but not exactly the same.
Each of the unwanted divs at this point starts with <div class="topsearchbar"> and ends with </div>. They are on lines 16-25 of the file.
Any pointers in the right direction would be appreciated.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Delete multiple lines of text from a file?
by repellent (Priest) on Feb 20, 2010 at 06:31 UTC | |
|
Re: Delete multiple lines of text from a file?
by ww (Archbishop) on Feb 20, 2010 at 05:46 UTC |