Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: header footer

by oiskuu (Hermit)
on Mar 04, 2014 at 23:03 UTC ( [id://1076998]=note: print w/replies, xml ) Need Help??


in reply to header footer

I'd suggest dividing the problem into parts. A: reading of a single record; B: writing the modified record; ...

Most of the difficulty is in the reading part, and we cannot really offer you much advice without learning all the details about the file format. Is the record of arbitrary length? Can a single record span megabytes? Is the record size encoded in the header?

Update: May a record contain "HDR" or "FTR" in its body (as a substring)?

Replies are listed 'Best First'.
Re^2: header footer
by gupr1980 (Acolyte) on Mar 05, 2014 at 00:04 UTC
    no. It will only be there in the header and footer. you wont find a HDR or FTR in the body.
      hmm.. i just realized some of the responses are not shown in full. Just the header :) Sry i am reading through those now. Thanks.
        So this is the one that i tried and it works. Thanks kenosis and others for taking the time.
        #!/usr/bin/perl use strict; use warnings; open FILE1,"input.txt"; open FILE2,">>output.txt"; foreach my $line ( <FILE1> ) { $_= $line; s/^HDR.{47}//; s/\KFTR.{27}//; print FILE2 $_; } close FILE1; close FILE2;
        this made the most clarity for me. I am assuming that the entire file doesnt gets stored in the $_ in this approach, correct me if i am wrong. and the file is only getting read one line at a time. This i want to make sure so that it performs well when i actually test with the huge file.
        i tried testing this with the big file.
        while ( <$FILE1> ) { s/^HDR.{47}//;
        is not stripping off the HDR and the following 47 characters.It looks like the 47th character is on a newline and hence not working? If i do a s/^HDR.{10}//; as a test it strips off the HDR fine. i also tried s/^HDR.{47}//g; doesnt make a difference. ***************** Never mind, found it. s/^HDR.{47}//s did it. Thanks.
Re^2: header footer
by gupr1980 (Acolyte) on Mar 04, 2014 at 23:08 UTC
    record is arbitrary in length yes. But like i said the header and footer lenght is always 50 and 30. Individual record is only about 10kb. But there are way too many records. record size is not encoded in the header.
      so am i missing something in thinking that read line by line check if pattern HDR exists - cut from end of header to rest of line and > outfile rest of lines > outfile keep going till find pattern > FTR - cut from FTR to end of line > outfile would this not work?

        No, I don't think you're missing a thing, and it may look like:

        use strict; use warnings; while (<>) { s/^HDR.{47}|\KFTR.+//; print; }

        In fact, it may be faster than substr on the 3GB file, but am not sure. Pattern matching to remove the header seems just fine. However, if FTR exists anywhere else in the record, a substitution will mess up the record--which is something substr will not do.

        … would this not work?

        Translate all that into perl code and see. If it doesn't work, post the perl code; if you don't know how to translate that, let us know where you're stuck.

        I think the code posted above by kenosis (at about the same time when about 10 minutes before you posted this question) should be a pretty good start, if not the full answer. It uses the "input record separator" to use "HDR" instead of new-line.

        On the first read, it'll just get "HDR", and output nothing. On each subsequent read, it will get a whole record (including the next occurrence of the string "HDR"), skip the first 47 characters (the rest of the header string), trim off the "FTR" and following text, and output just the remaining record content (including whatever line breaks it contains).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1076998]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-23 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found