in reply to Re: Parsing .txt into arrays
in thread Parsing .txt into arrays

thank you marshall, I really appreciate your response , my problem now is to extract same type of tables lets say position log table here, as all the position log tables have same format, i want to know the start and end line numbers of these tables ,how can I do that mind you I have a huge text file order of gb's I want to extract similar tables and export them so that I can code them table specific (also table doesn't contain just a table it has some details, here place and time %I want to store these as well % )

approach: say every position log table has extension Fp379 and all pages start with the year ,I want to use these keywords%Fp379 for position log page and 2017 for all pages % to separate the required tables

like wise similar pages start with same extension.
2017 Position log :Fp379 place: cal time: 23:01:45 | | |Pos |value | |bulk|lot| prev| newest| |# |Locker|(dfg) |(no) |nul|val |Id | val |val | ----------------------------------------------------------- | 0| 1| 302832| -11.88| 1| 0|Pri| 16| 0| | 1| 9| 302836| 11.88| 9| 0|Pri| 10| 0| | 2| 1| 302832| -11.88| 5| 3|Pri| 14| 4| | 3| 3| 302833| 11.88| 1| 0|sec| 12| 0| | 4| 6| 302837| -11.88| 1| 0|Pri| 16| 3|

Replies are listed 'Best First'.
Re^3: Parsing .txt into arrays
by roboticus (Chancellor) on May 26, 2017 at 12:50 UTC

    Fshah:

    The $. variable contains the line number last read from the last filehandle access. So you can simply store the value each time you start a new array and add the starting line number to your table of information. You can read more about the $. variable at perlvar.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re^3: Parsing .txt into arrays
by Marshall (Canon) on May 26, 2017 at 19:28 UTC
    Hi Fshah,
    Ok, for these extra requirements, I modified the GET_NAME state to allow for multi-line names instead of just keeping the last non-blank line before the table starts. Keeping track of the line numbers from the original file sounds weird, but I added that info to the $name record using $., the current file handle's current line number.

    I would recommend just letting the code parse out each table that it encounters. At the finish_current_table() subroutine, make a decision of whether or not you want to actually keep the current table or not? I just hard coded a regex for /2017.*?Fp379/ but of course this could be more flexible. Note that to "keep" the table, I added it to a @results data structure, which I "dumped" right before the program ends. I would presume that in the "real code", instead of adding to the @results structure, some export() function is called to put the table into a DB or make a discrete file in some sort of CSV format? I did not generate strictly conformant CSV (multi-word strings should be quoted).

    From the size of the input file you are describing, it sounds to me like putting these tables into a SQL DB is the right way to go. The Perl DBI is fantastic.

    Code:

      thank you Marshall , I see the code you sent was of great use to me, but the table gets parsed line by line(row wise) but I want arrays of columns so that it will be easy to compare similar columns, also I have some header for the table I want to store how can I make it possible e.g:

      1)in the table here I want an array locker which should contain all the values in the column,

      2)also in the given table as you can see there are blanks which mean they are same as the value previously present in the column, is it possible to repeat the same value as previous for the blanks and also there is a header which contains time etc ,

      3)as you can see there are 11 rows here I want an array which has time and repeated 11 times (number of rows) and similarly for sequence and range .

      4)I want to use key word 1349F.63 here to find the similar tables (there are other tables with heading as "position log table"but with different extension),

      5)from the first line I want to extract the 4th value ie in this case 1349F.63.

      6)I see you are using last line before the table starts say I want to look at 13th line before the table to decide which particular table I want to store (and also store those 13 header lines in the format mentioned above)

      7) I don't want to print all the tables I want to print only the tables which have the key word say "1349F.63" in this case prints all position log table corresponding to the extension

      Position log table 1349F.63 time 10:23:66 sequence = 39 range = 6678 | | |Pos |value | |bulk|lot| prev| newest| |# |Locker|(dfg) |(no) |nul|val |Id | val |val | ----------------------------------------------------------- | 0| 1| 302832| -11.88| 1| 0|Pri| 16| 0| | 5| | | | | | | | | | 6| | | | | | | | | | 7| | | | | | | | | | 1| 9| 302836| 11.88| 9| 0|Pri| 10| 0| | 2| 1| 302832| -11.88| 5| 3|Pri| 14| 4| | 5| | | | | | | | | | 6| | | | | | | | | | 7| | | | | | | | | | 3| 3| 302833| 11.88| 1| 0|sec| 12| 0| | 4| 6| 302837| -11.88| 1| 0|Pri| 16| 3|
      thanks for the help
        Hi Fshah,
        I think some clarification about PerlMonks is in order.
        This is a site where you can ask questions with the intent of learning about Perl. I am completely happy to help you learn at no charge. I am happy if you are learning. I am not happy if you are not learning.

        Right now it appears that you are expecting me to write your code for you - without demonstrating much effort on your part.

        I do have clients that pay me for solving their problems. Quite frankly these folks will get much higher priority than you. However I and others here are willing to help you learn. BUT, that means that you need to show some coding effort.

        Your points 4,5,6 and 7 tell me that you didn't run much less understand the code which I modified for you.

        1) Transposing a table, converting rows to columns is not that difficult if you think logically about it. I want to see a serious attempt by you. Use the 2-d table that my code generates.

        2) Setting the current field to what was before in the case that it is "blank" (whether row-wise or column-wise) is also something that you should be able to make an attempt at.

        The construction of a state machine to parse your various tables was beyond either of these tasks and I felt that it was necessary to get you "unstuck".

        Solving this problem will help you. Write code that generates @transposed using @array as input. I know its hard, but give it a go...

        #!/usr/bin/perl use strict; use warnings; my @array = ( ['a', '1', 'L'], ['b', '2', 'M'], ['c', '3', 'N'], ['d', '4', 'O'],); my @transposed = ( ['a', 'b', 'c', 'd'], ['1', '2', '3', '4'], ['L', 'M', 'N', 'O'],); foreach my $row_ref (@array) { print "@$row_ref\n"; } # Prints: #a 1 L #b 2 M #c 3 N #d 4 O foreach my $row_ref (@transposed) { print "@$row_ref\n"; } # Prints: #a b c d #1 2 3 4 #L M N O