Dear Monks,
Hoping you might assist with code that would allow me to parse a text file
Basically, I would like to delete out all text in a file that is in between two specific points in the file. These points occur numerous times. Specifically:
1. Starting at text that reads (without the quotes): "<?xml version="1.0" encoding="UTF-8"?>"
2. Ending at text that is the html code ending a table: /table> (with the < ). This is the first /table> after my start point and is always preceded by html code: 5 tabs /b
4 tabs /div>
3 tabs /td>
2 tabs /tr>
2 tabs /table>
Then the file continues on to text I would like to keep and tables in the text that I also do not want deleted.

Basically looks like this... (Start point text,<?xml version="1.0" encoding="UTF-8"?>) - text to delete - (end point text,/table>) - text to keep - repeat the pattern with the start point text again.

Was hoping to run this from the command line (#!/usr/bin/perl) and specifiy the input and output file.

FYI backgound..I would describe myself as a well computer enthusiast (good with linux, html, php, mysql...), but have only recently began dabbling in perl. I have been scouring the internet for some code snippets to cob together and perlmonks.org fom some knowledge for this project, but have been unsuccessful.

Any assistance is greatly appreciated.

In reply to file parsing by catch22

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.