Re^3: Avoid memory error while extracting text block

If the intent is to split this compound file into separate files, I think I'd do soemthing like this:

#! perl -slw
use strict;

# assuming the name of the compound file is supplied on the command li
+ne

until( eof( ARGV ) ) {
    my @buffer = <>;                                   ## Put the <DOC
+UMENT> tag into a clean buffer
    push @buffer, scalar( <> ), scalar( <> );          ## Ditto <TYPE>
+ & <SEQUENCE>
    my( $filename ) = ( my $line = <> ) =~ m[<FILENAME>(\S+)];
    push @buffer, $line;

    open OUT, '>', $filename or die $!;
    print OUT for @buffer;
    print OUT until ( $_ = <> ) =~ m[</DOCUMENT>];
    close OUT;
}
[download]

Note:That is untested code and will probably need tweaks. Eg. I'm not convinced that it will print the final tag to the output files; but then maybe you'd want to strip those anyway.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^3: Avoid memory error while extracting text block Download Code

Replies are listed 'Best First'.
Re^4: Avoid memory error while extracting text block by wrkrbeee (Scribe) on Apr 14, 2016 at 12:59 UTC
Thank you!!	[reply]