Re^2: Module for parsing tables from plain text document

Replies are listed 'Best First'.
Re^3: Module for parsing tables from plain text document by NERDVANA (Priest) on Jan 09, 2023 at 03:58 UTC
I wrote something similar for PDF once, and also wrote Data::TableReader, but I never got around to making PDF into one of the decoders. For PDF, it made sense to look at start X addresses for segments of text, and identify it as a column if there were roughly as many text fragments starting at an X as there are estimated number of lines. Text has less granularity, so I think if I were going to try writing it for text, I would iterate lines of text and make a history of which columns have a vertical run of whitespace, and at the EOF or first blank line, see which runs of whitespace lasted from the first to the last line. Concatenate adjacent whitespace columns, and then report the space inbetween as the data columns. It would be really awesome if you wanted to contribute a Decoder for Data::TableReader :-)	[reply]
Re^4: Module for parsing tables from plain text document by LanX (Saint) on Jan 09, 2023 at 13:52 UTC
Could you please show an example how to parse the OP's table? I find this example particularly challenging, since it has nested columns multiple subdivided head captions especially "Longitude" is overlapping the "empty column" limiting its data entries below. Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^5: Module for parsing tables from plain text document by cavac (Prior) on Jan 13, 2023 at 14:03 UTC
Let me throw in some assumptions here (having to deal with quite a few other text-based formats at work): You can skip the headers, because they are standardized across files Everything before the first number (or minus sign) is the location name Data columns always contain a value The only column that can contain spaces is the location name That means, we can just collapse spaces. We have to handle the location name special, but after that can use split to recover the columns: #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use Carp; my @sites; open(my $ifh, '<', 'eclipse.txt') or croak($!); # Skip header for(1..5) { my $tmp = <$ifh>; } while((my $line = <$ifh>)) { chomp $line; next if($line eq ''); # Ignore empty lines my %entry; $line =~ s/\ +/ /g; # Collapse spaces if($line =~ /^(.?)\s[-\d]/) { $entry{location} = $1; # Remove location name $line =~ s/^.?\s([-\d])/$1/; # Split along spaces my @parts = split/\ /, $line; foreach my $name (qw[long1 long2 lat1 lat2 elevation h m s PA +Alt]) { $entry{$name} = shift @parts; } push @sites, \%entry; } } close $ifh; print Dumper(\@sites); [download] That results in an array of hashes: $VAR1 = [ { 's' => '59', 'elevation' => '0', 'long2' => '45.', 'lat2' => '55.', 'lat1' => '-36', 'location' => 'Auckland', 'm' => '33', 'h' => '4', 'long1' => '174', 'PA' => '313', 'Alt' => '13' }, { 'h' => '4', 'm' => '40', 'PA' => '326', 'Alt' => '11', 'long1' => '173', 'lat2' => '35.', 'long2' => '55.', 's' => '34', 'elevation' => '30', 'location' => 'Blenheim', 'lat1' => '-41' }, { 'h' => '4', 'm' => '42', 'PA' => '327', 'Alt' => '9', 'long1' => '175', 'lat2' => '35.', 'long2' => '25.', 's' => '28', 'elevation' => '0', 'location' => 'Cape Palliser', 'lat1' => '-41' }, ... [download] PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP	[reply] [d/l] [select]
Re^6: Module for parsing tables from plain text document by LanX (Saint) on Jan 13, 2023 at 14:15 UTC
Re^7: Module for parsing tables from plain text document by cavac (Prior) on Jan 13, 2023 at 14:39 UTC
Some notes below your chosen depth have not been shown here
Re^7: Module for parsing tables from plain text document by Anonymous Monk on Jan 13, 2023 at 15:11 UTC
Some notes below your chosen depth have not been shown here