You haven't written it explicitly, but lets assume that the trade-messages are not nested and that the file is not corrupted (no unmatched tags)... otherwise things will get more complicated
You say your file is huge => therefore you should read it line-by-line
based on the assumptions above, the algorithm below should do the trick
- if both tags (<MyId>, </MyId>)are in your line move the text between to your output and continue processing the line
- if only the startTag (<MyId>) is in your line, copy all text from that tag to the end of the line to a temporary variable and set a marker; then process the next line
- if none of the tags is in your line (and the marker is set), copy that line to your temporary variable and process the next line
- if only the endTag (</MyId>) is in your line, copy everything from the start of the line to the tag to your temporary variable, write the temporary variable to your output and delete it afterwards and reset the marker; then process the next line
note1: you will surely want to enhance the first check to cover the situation that the endTag occurs before the startTag
note2: check substr ... no need for regular expressions here
note3: if things get more complicated (e.g. you need to evaluate sub-Tags, or the input is really XML and contains comments (<!--)) you will want to use some of the fine XML-modules of CPAN instead (don't forget to check the Tutorials section on that topic ;-)
HTH, Rata
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.