Re^3: Parsing .txt into arrays

Hi Fshah,
Ok, for these extra requirements, I modified the GET_NAME state to allow for multi-line names instead of just keeping the last non-blank line before the table starts. Keeping track of the line numbers from the original file sounds weird, but I added that info to the $name record using $., the current file handle's current line number.

I would recommend just letting the code parse out each table that it encounters. At the finish_current_table() subroutine, make a decision of whether or not you want to actually keep the current table or not? I just hard coded a regex for /2017.*?Fp379/ but of course this could be more flexible. Note that to "keep" the table, I added it to a @results data structure, which I "dumped" right before the program ends. I would presume that in the "real code", instead of adding to the @results structure, some export() function is called to put the table into a DB or make a discrete file in some sort of CSV format? I did not generate strictly conformant CSV (multi-word strings should be quoted).

From the size of the input file you are describing, it sounds to me like putting these tables into a SQL DB is the right way to go. The Perl DBI is fantastic.

Code:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @results=();  # this is Array of Array, 
                 # [$table_name, [@data]]
                 # row[0] of @data contains the column names

################

my $state;
my $name;
my $line_num_start_rec;
my $line_num_end_rec; #may not need this
my @data;
my @col_names;

sub start_new_table_entry 
{
   $state = 'GET_TABLE_NAME';
   $name = "";
   @data=();
   @col_names=();
   $line_num_start_rec = 0;
   $line_num_end_rec = 0;
}

sub finish_current_table
{  
   return unless ($state ne 'GET_TABLE_NAME');     ## Ver2
   
   $name .= "Record_Start: $line_num_start_rec\n"; ## Ver2 
   $name .= "Record_End: $.\n";                    ## Ver2
       
       
   #this is where data is "saved"
   #probably calls something to put data into a DB?    
   
   if ($name =~ /2017.*?Fp379/) #decide which tables to "keep"
   {
      unshift @data,[@col_names];
      push @results,[$name,[@data]];
   }   
       
   $state = 'GET_TABLE_NAME';
   return;
}
              
start_new_table_entry();

REDO_LINE: while (my $line = <DATA>)
{
   $line =~ s/^\s*//; # delete leading spaces
   $line =~ s/\s*$//; # delete trailing spaces
                      #  (this includes line endings)

   if ($state eq 'GET_TABLE_NAME') #### TABLE NAME ###
   {
      if ($line =~ /^\|/)  # premature start of column name state! Who
+a!
      {
         # special case of malformed table without
         # a starting banner of --- or _ _ _
         # we are already in the column name state!
         
         $state = 'GET_COL_NAMES';
         $line_num_start_rec = $. if $name eq ""; ## Ver2 Table has no
+ name
         redo REDO_LINE;
      }   
      elsif ($line !~ /^[-_]/)  #keep going - normal case
      {    
         $line_num_start_rec = $. if $name eq "";   ###  Ver 2 rec lin
+e numbers
         $name .= "$line\n" if $line =~ /\S/;       ###  Ver 2 multi-l
+ine name     
      }
      else
      {  
         $state = 'GET_COL_NAMES';  
      }
   }
   elsif ($state eq 'GET_COL_NAMES') #### COLUMN NAMES ###
   {
      if ($line !~ /(^\|[-_])|(^[-])/ )  #keep going
      { 
         $line =~ s/^\|\s*//; 
         my @col_name_raw = split /\|/,$line;  
      
         my $col=0;
         foreach my $this_col (@col_name_raw)
         {
            $this_col =~ s/\s*$//;
            $this_col =~ s/^\s*//; 
            $col_names[$col]//= "";
            $this_col = " $this_col" if ($col_names[$col] ne "");
            $col_names[$col++] .= "$this_col";
         } 
      }
      else
      {
         $state = "GET_DATA";      
      }
   }
   elsif ($state eq 'GET_DATA')   #### DATA ROWS ###
   {
      if ( $line =~ /^\|/)       #keep going    
      {
         $line =~ s/^\|\s*//; 
         my @this_data = split /\|/,$line;
         @this_data = map {s/\s*$//;s/^\s*//;$_}@this_data;
         push @data,[@this_data];
      }
      else
      {
         finish_current_table();
         start_new_table_entry();
      }
   }
}

finish_current_table();  # in case of malformed end of table

# dump results in "psuedo" CSV format
# also consider looking at:
# print Dumper \@results;

foreach my $tableref (@results)
{
  my ($name,$dataref) = @$tableref;
  
  print "TABLE: $name";  ### Ver 2 changed for multi-line name
  my $row0 = shift @$dataref;
  print "COLUMNS: ",join(",",@$row0),"\n";
  
  foreach my $row (@$dataref)
  {
    print join(",",@$row),"\n";
  }
  print "\n";
}
=PRINTED OUTPUT

TABLE: 2017 Position log :Fp379
place: cal
time: 23:01:45
Record_Start: 31
Record_End: 44
COLUMNS: #,Locker,Pos (dfg),value (no),nul,bulk val,lot Id,prev val,ne
+west val
0,1,302832,-11.88,1,0,Pri,16,0
1,9,302836,11.88,9,0,Pri,10,0
2,1,302832,-11.88,5,3,Pri,14,4
3,3,302833,11.88,1,0,sec,12,0
4,6,302837,-11.88,1,0,Pri,16,3

=cut



__DATA__


place and year data: 67

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|no.|  name | age  | place | year |
|_ _|_ _ _ _|_ _ _ | _ _ _ |  _ _ |
|1  |  sue  |33    | NY    | 2015 |
|2  |  mark |28    | cal   | 2106 |


work and language :65
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
|no.| name | languages | proficiency | time taken|
|_ _| _ _ _| _ _ _ _ _ |_ _ _ _ _ _ _| _ _ _ _ _ |      
|1  | eliz | English   | good        | 24 hrs    |
|2  | susan| Spanish   | good        | 13 hrs    |
|3  | danny| Italian   | decent      | 21 hrs    |

Position log

   |   |      |Pos     |value   |   |bulk|lot|   prev| newest|
   |#  |Locker|(dfg)   |(no)    |nul|val |Id |   val |val    |
   -----------------------------------------------------------
   |  0|     1|  302832|  -11.88|  1|   0|Pri|     16|      0|
   |  1|     9|  302836|   11.88|  9|   0|Pri|     10|      0|
   |  2|     1|  302832|  -11.88|  5|   3|Pri|     14|      4|
   |  3|     3|  302833|   11.88|  1|   0|sec|     12|      0|
   |  4|     6|  302837|  -11.88|  1|   0|Pri|     16|      3|

2017 Position log :Fp379
place: cal
time: 23:01:45


   |   |      |Pos     |value   |   |bulk|lot|   prev| newest|
   |#  |Locker|(dfg)   |(no)    |nul|val |Id |   val |val   |
   -----------------------------------------------------------
   |  0|     1|  302832|  -11.88|  1|   0|Pri|     16|      0|
   |  1|     9|  302836|   11.88|  9|   0|Pri|     10|      0|
   |  2|     1|  302832|  -11.88|  5|   3|Pri|     14|      4|
   |  3|     3|  302833|   11.88|  1|   0|sec|     12|      0|
   |  4|     6|  302837|  -11.88|  1|   0|Pri|     16|      3|
[download]

Comment on Re^3: Parsing .txt into arrays Select or Download Code

Replies are listed 'Best First'.
Re^4: Parsing .txt into arrays by Fshah (Initiate) on Jun 02, 2017 at 09:39 UTC
thank you Marshall , I see the code you sent was of great use to me, but the table gets parsed line by line(row wise) but I want arrays of columns so that it will be easy to compare similar columns, also I have some header for the table I want to store how can I make it possible e.g: 1)in the table here I want an array locker which should contain all the values in the column, 2)also in the given table as you can see there are blanks which mean they are same as the value previously present in the column, is it possible to repeat the same value as previous for the blanks and also there is a header which contains time etc , 3)as you can see there are 11 rows here I want an array which has time and repeated 11 times (number of rows) and similarly for sequence and range . 4)I want to use key word 1349F.63 here to find the similar tables (there are other tables with heading as "position log table"but with different extension), 5)from the first line I want to extract the 4th value ie in this case 1349F.63. 6)I see you are using last line before the table starts say I want to look at 13th line before the table to decide which particular table I want to store (and also store those 13 header lines in the format mentioned above) 7) I don't want to print all the tables I want to print only the tables which have the key word say "1349F.63" in this case prints all position log table corresponding to the extension Position log table 1349F.63 time 10:23:66 sequence = 39 range = 6678 \| \| \|Pos \|value \| \|bulk\|lot\| prev\| newest\| \|# \|Locker\|(dfg) \|(no) \|nul\|val \|Id \| val \|val \| ----------------------------------------------------------- \| 0\| 1\| 302832\| -11.88\| 1\| 0\|Pri\| 16\| 0\| \| 5\| \| \| \| \| \| \| \| \| \| 6\| \| \| \| \| \| \| \| \| \| 7\| \| \| \| \| \| \| \| \| \| 1\| 9\| 302836\| 11.88\| 9\| 0\|Pri\| 10\| 0\| \| 2\| 1\| 302832\| -11.88\| 5\| 3\|Pri\| 14\| 4\| \| 5\| \| \| \| \| \| \| \| \| \| 6\| \| \| \| \| \| \| \| \| \| 7\| \| \| \| \| \| \| \| \| \| 3\| 3\| 302833\| 11.88\| 1\| 0\|sec\| 12\| 0\| \| 4\| 6\| 302837\| -11.88\| 1\| 0\|Pri\| 16\| 3\| [download] thanks for the help	[reply] [d/l]
Re^5: Parsing .txt into arrays by Marshall (Canon) on Jun 04, 2017 at 01:44 UTC
Hi Fshah, I think some clarification about PerlMonks is in order. This is a site where you can ask questions with the intent of learning about Perl. I am completely happy to help you learn at no charge. I am happy if you are learning. I am not happy if you are not learning. Right now it appears that you are expecting me to write your code for you - without demonstrating much effort on your part. I do have clients that pay me for solving their problems. Quite frankly these folks will get much higher priority than you. However I and others here are willing to help you learn. BUT, that means that you need to show some coding effort. Your points 4,5,6 and 7 tell me that you didn't run much less understand the code which I modified for you. 1) Transposing a table, converting rows to columns is not that difficult if you think logically about it. I want to see a serious attempt by you. Use the 2-d table that my code generates. 2) Setting the current field to what was before in the case that it is "blank" (whether row-wise or column-wise) is also something that you should be able to make an attempt at. The construction of a state machine to parse your various tables was beyond either of these tasks and I felt that it was necessary to get you "unstuck". Solving this problem will help you. Write code that generates @transposed using @array as input. I know its hard, but give it a go... `#!/usr/bin/perl use strict; use warnings; my @array = ( ['a', '1', 'L'], ['b', '2', 'M'], ['c', '3', 'N'], ['d', '4', 'O'],); my @transposed = ( ['a', 'b', 'c', 'd'], ['1', '2', '3', '4'], ['L', 'M', 'N', 'O'],); foreach my $row_ref (@array) { print "@$row_ref\n"; } # Prints: #a 1 L #b 2 M #c 3 N #d 4 O foreach my $row_ref (@transposed) { print "@$row_ref\n"; } # Prints: #a b c d #1 2 3 4 #L M N O` [download]	[reply] [d/l]
Re^6: Parsing .txt into arrays by Fshah (Initiate) on Jun 07, 2017 at 06:30 UTC
i'm sorry I didn't realize you already sent the solution in the earlier comment, i was working on something else with reference to your code where I want to print lines only after I find the keyword until the table starts ,modifying your previous code I was able to perform my required operation on the obtained output but i'd like to optimize it by just printing the lines after I find the keyword(including the line with keyword),in your code it prints all the lines above the table but I need only those after I find the keyword!! thank you.	[reply]
Re^7: Parsing .txt into arrays by Marshall (Canon) on Jun 07, 2017 at 07:35 UTC
Re^8: Parsing .txt into arrays by Fshah (Initiate) on Jun 12, 2017 at 04:37 UTC
Some notes below your chosen depth have not been shown here