G'day GrandFather,

[Sorry, a bit late to the party; I haven't logged in for a week and a half.]

I don't know what sort of variations may exist for your input data. Purely on what's shown, here's how I might get the data into a canonical form suitable for subsequent processing (via split, Text::CSV, or other).

parse_000.pl:

#!/usr/bin/env perl use strict; use warnings; use autodie; my $data_file = 'data_000.txt'; my $re = qr{^([^0-9-]+?)\s+([0-9-][ 0-9.-]+?)\s*$}; my @data; open my $fh, '<', $data_file; while (<$fh>) { next unless /$re/; my ($site, $info) = ($1, $2); $info =~ y/ /\t/s; push @data, join "\t", $site, $info; } # For demo only use Data::Dump; dd \@data;

Input:

$ cat data_000.txt Annular-Total Eclipse of 2023 Apr 20 - multisite predictions 1st Contact Site Longitude Latitude Elvn U.T. PA Alt o ' o ' m h m s o o Auckland 174 45. -36 55. 0 4 33 59 313 13 Blenheim 173 55. -41 35. 30 4 40 34 326 11 Cape Palliser 175 25. -41 35. 0 4 42 28 327 9 Cape Reinga 172 45. -34 25. 50 4 30 11 307 17 Carterton 175 35. -41 5. 0 4 40 35 324 10 Dannevirke 176 5. -40 15. 200 4 39 9 321 10 East Cape 178 35. -37 45. 0 4 37 58 315 10 Featherston 175 25. -41 5. 40 4 40 36 325 10 Gisborne 178 5. -38 45. 0 4 38 29 317 10 Great Barrier Is 175 25. -36 15. 0 4 34 15 312 13

Output:

$ ./parse_000.pl [ "Auckland\t174\t45.\t-36\t55.\t0\t4\t33\t59\t313\t13", "Blenheim\t173\t55.\t-41\t35.\t30\t4\t40\t34\t326\t11", "Cape Palliser\t175\t25.\t-41\t35.\t0\t4\t42\t28\t327\t9", "Cape Reinga\t172\t45.\t-34\t25.\t50\t4\t30\t11\t307\t17", "Carterton\t175\t35.\t-41\t5.\t0\t4\t40\t35\t324\t10", "Dannevirke\t176\t5.\t-40\t15.\t200\t4\t39\t9\t321\t10", "East Cape\t178\t35.\t-37\t45.\t0\t4\t37\t58\t315\t10", "Featherston\t175\t25.\t-41\t5.\t40\t4\t40\t36\t325\t10", "Gisborne\t178\t5.\t-38\t45.\t0\t4\t38\t29\t317\t10", "Great Barrier Is\t175\t25.\t-36\t15.\t0\t4\t34\t15\t312\t13", ]

— Ken


In reply to Re: Module for parsing tables from plain text document by kcott
in thread Module for parsing tables from plain text document by GrandFather

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.