in reply to File read and re-ordering

Right, looking at your updated post there are still a few imponderables.

Is the start of record tag the literal caret followed by uppercase L or are we seeing a representation of a <control-L>?

Because of line-wrap it is difficult to tell what lines your header data is on. It looks like name, address and country/ZIP are on lines 3, 4 and 5 but are you certain that the address will always fit this pattern? What characters are you likely to find and have to allow for in the data? It looks like you have a/c no. on lines 6 and 8; are the lines identical and what is the format of the number? Did you mean end-period-date rather than end-period-value? What is the date format?

Most importantly, is that all of the data you wish to extract from the header? What widths are you going to lay down for each field? By specifying fixed length records you imply that there will be no field separators in your header file; is this what you want or do you want separators to aid legibility?

As for the proposed output files, it looks like you intend to have a single header file containing a line of fixed-width data for each customer plus a file of variable length data for each customer. How do you intend to associate the header info with the relevant data file? I assume that a/c no. would be unique so that could form part of the data file name.

Answering these questions may help you towards a solution and help us to help you.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: File read and re-ordering
by KarmicGrief (Initiate) on Oct 23, 2006 at 14:08 UTC
    Sorry about the formating, being newbish sucks. I will try to answer the questions here to give a better idea of what is happening. Record starts with a literal caret L which I am counting as line 1, line 2 is always blank, line 3 can be blank or contain a name, line 4 will always contain a name, line 5 will always contain an address, line 6 is always blank, line 7 contains city state zip( which would need split out to line 7, line 7a, and line 7b), line 8 contains an a/c # ( in format of #-#), line 9 is blank, line 10 is blank, line 11 contains an a/c # ( in same format as prior), line 12 contains the 2 date fields, beginning period and ending period (format is for example 01 Oct 2006 31 Oct 2006) and I will need those split to a line 12 and line 12a, line 13 is blank, line 14 contains a message line, line 15 contains a message line, line 16 is blank and line 17 is blank, line 18 begins the details of the account, line 19 through variable number of lines is the details and finally it ends with an (EOE), then the next record begins again with the ^L. That is all the data I need to pull for the header and the detail files, the widths on the header file vary depending on which field, it could be a 12 character field or a 40 character field. No need for field delimiters since there is a process already in place to read the exact field positionings. The a/c number is what would be used to associate the detail to the header file and yes the header file needs to be one line for each customer. Does this help?
      Re. formatting, have a read of the link shmem gave you and just have a play around to see what works, trying things out on your private scratchpad which you can find on your home node. <p> and <code> ... </code> tags are your friends.

      So, further questions.

      Do you want to capture the possible name on line 3 or will you always use the one on line 4?

      Are the a/c nos. on lines 8 and 11 the same and do you want to capture just one of them?

      Are the dates ddMMMyyyy or dd MMM yyyy? It looks like the latter.

      Is information on line 18 significant or is it just a marker with the meat starting on line 19 et seq.?

      How many output fields, what order, what widths and what pad character? What is your policy on truncating data that is too wide?

      I think that given answers to the above I can (without writing your whole application for you :-) make some suggestions and code pointers on how you can proceed.

      Cheers,

      JohnGG

        Advice and suggestions are greatly appreciated. I am a complete newb when it comes to PERL. I will need to capture line 3 if it appears and line 4 always since there will always be a name there. The a/c #'s are supposed to be the same number and have been only represented once in the past so a capture of either is fine. The dates are dd MMM yyyy format yes. Line 18 is the starter of the detail and needs to be included in the first line on the details file. Pad character needs to be a space, there should be no truncating of data but if the data that populates the specific field is longer than the specified width it would truncate certainly ( has not been an issue before). I am going to give the format for the header file now so bear with me. Starting at character 0( is the account number line 8 or 11 and field 1), starting character 18 is field 2(40 characters) static text, character 58 is field 3(40 characters) static text, starting character 98 is field 4(40 characters) static text, starting character 138 is field 5(40 characters) static text, starting character 178 is field 6(40 characters) is line 3 if data exists or line 4 if it does not exist, starting character 218 is field 7(40 characters) and is line 5, starting character 258 is field 8(40 characters) and is line 7, starting character 298 is field 9(40 characters) static text, starting character 338 is field 10(40 characters) static text, starting character 378 is field 11(7 characters which is the first date value on line 12 and needs to be formated to ddmmmyy), starting character 385 is field 12(7 characters, the second date on line 12 and needs to be format ddmmmyy), starting character 392 is field 13(10 characters)static text, starting character 402 is field 14(10 characters) static text, starting character 412 is field 15(40 characters) this is line 4 if line 3 has data and results to blanks if no data on line 3, starting character 452(8 characters) this is the starting line count for the matching detail file, starting character 460(6 characters) this is the count of detail lines outputted to the detail file( used to compute the prior value obviously). If this is unclear, do you have a way I could send you the lay out file and what not so you could see more clearly how it is laid out?