in reply to Re: file parsing
in thread file parsing

Thank you for the replies monks! Here is a cut down example in response to wfsp's post.
The code to remove begins here:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/ +xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang= +"en"> <head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charse +t=ISO-8859-1">

And it continues with a bunch of code I would like to remove. My end marker for removal would be here:
</b> </div> </td> </tr> </table>
This bunch of code repeats numerous times in the file.
After this code-to-remove, I have the code I would like to keep, tags and all, untouched by any parsing or modification.

After this code-to-keep, begins the cycle of code-to-remove again, as above:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/ +xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang= +"en"> <head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charse +t=ISO-8859-1">

and so on...
I was reading up on and trying some code found from the internet and this post (Thank you Martin for your post), but can't seem to have the code-to-keep untouched by the parser.

Thank you

Replies are listed 'Best First'.
Re^3: file parsing
by wfsp (Abbot) on Jul 12, 2008 at 06:37 UTC
    Could still do with some more info. :-)

    Is there a pattern in what you want to keep? It would be easier that way round. Also in your "end marker" there is a closing div and a closing table. What do the opening tags look like? Are there any identifiable attributes?

      Thanks, Was able to solve this using a combination of code from here and the internet in general, and marked the code I wanted to keep.
      opendir (DIR, "/your/directory/here/") or die "$!"; my @files = grep {/.extention to search/} readdir DIR; close DIR; foreach my $file (@files) { open(FH,"/your/directory/here/$file") or die "$!"; while (<FH>){ print $_ if m{text to keep, start} .. m{text to keep, end}; } close(FH); }