Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Is there a "Here Table"?

by rje (Deacon)
on Apr 07, 2015 at 20:43 UTC ( [id://1122738] : perlquestion . print w/replies, xml ) Need Help??

rje has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking about this just a few minutes ago.

Is there a package which converts a text-format document into a hashtable? For example:

Name UPP Age Career Terms -------- ------ --- ---------- ----- Rejnaldi 765987 38 Citizen 6 Lisandra 6779AA 34 Noble 4 Kuran 899786 42 Marine 8

...Is slurped into

[ { Name => 'Rejnaldi', UPP => '765987, Age => 38, Career => 'Citizen', Terms => 6, }, { Name => 'Lisandra', UPP => '6779AA', Age => 34, Career => 'Noble', Terms => 4, }, { Name => 'Kuran', UPP => '899786', Age => 42, Career => 'Marine', Terms => 8, }, ]

Replies are listed 'Best First'.
Re: Is there a "Here Table"?
by BrowserUk (Patriarch) on Apr 07, 2015 at 21:07 UTC

    It's hard to see the need for a module to do that. Especially as the '----' line is very specific to your data format; thus would need a special case or option.

    And because it is very simple to do:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @keys = split ' ', scalar <DATA>; <DATA>; ## discard ----- my @data = map{ my %hash; @hash{ @keys } = split ' '; \%hash; } <DATA>; pp \@data; __END__ Name UPP Age Career Terms -------- ------ --- ---------- ----- Rejnaldi 765987 38 Citizen 6 Lisandra 6779AA 34 Noble 4 Kuran 899786 42 Marine 8

    Produces:

    C:\test>junk [ { Age => 38, Career => "Citizen", Name => "Rejnaldi", Terms => 6, UP +P => 765987 }, { Age => 34, Career => "Noble", Name => "Lisandra", Terms => 4, UPP +=> "6779AA" }, { Age => 42, Career => "Marine", Name => "Kuran", Terms => 8, UPP => + 899786 }, ]

    Of course, someone will complain that it doesn't handle names with spaces, and so you need to switch to fixed field record processing:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @keys = unpack 'A8xA6xA3xA10xA5', scalar <DATA>; <DATA>; ## discard my @data = map{ my %hash; @hash{ @keys } = unpack 'A8xA6xA3xA10xA5', $_; \%hash; } <DATA>; pp \@data; __END__ Name UPP Age Career Terms -------- ------ --- ---------- ----- Rejnaldi 765987 38 Citizen 6 Lisandra 6779AA 34 Noble 4 Kuran 899786 42 Marine 8

    Which produces the same output. But ... they'll say: what if you want to read lots of different files in the same format?

    Then you need to determine the fields sizes from the data:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @keys = scalar( <DATA> ) =~ m[(\S+\s*)\s]g; my $templ = join 'x', map{ 'A' . length() } @keys; @keys = map{ $_ =~ s[\s+$][]; $_ } @keys; <DATA>; ## discard my @data = map{ my %hash; @hash{ @keys } = unpack $templ, $_; \%hash; } <DATA>; pp \@data; __END__

    Again, same output.

    But what if the keys can contain spaces? In which case you'll need to use a heuristic approach to locating the field boundaries


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      what if the keys can contain spaces?

      Nah, just cache the first line and parse it after you've read the second line.

      BTW - the data shown is (obviously?) in the format produced by default by sql select statements, at least for some major RDMS... So you'd think that this would be a "solved problem" by now...

      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

        My quick whack at it:

        sub parse_HERE_table($) { my @lines = split /[\r\n]+/, $_[0]; my $pattern = splice @lines, 1, 1; my $len = length $pattern; $pattern =~ y/-/A/; # to use \b, we need \w chars, and '-' is not +\w. $pattern =~ s/\bA/(A/g; # too bad we can't do s/\</(/g and s/\>/)/ +g :-( $pattern =~ s/A\b/A)/g; $pattern =~ y/A/./; ( my $header, @lines ) = map { [ map { s/\s+$//; $_ } /$pattern/ ] } # parse; trim trai +ling whitespace from each value. map { $_.(' 'x($len-length($_))) } # pad with spaces to ensure + it's long enough to match. grep { /\S/ } # skip blank lines. @lines; [ map { my %r; @r{ @$header } = @$_; \%r } @lines ] } my $arrayref = parse_HERE_table <<EOF; Name UPP Age Career Terms -------- ------ --- ---------- ----- Rejnaldi 765987 38 Citizen 6 Lisandra 6779AA 34 Noble 4 Kuran 899786 42 Marine 8 EOF

        This code assumes well-formed input. You could certainly add error checking and so on.

        BTW - the data shown is (obviously?) in the format produced by default by sql select statements, at least for some major RDMS... So you'd think that this would be a "solved problem" by now...

        Well, yes. There is Parse::SQLOutput ... but it won't work for the OPs data.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Is there a "Here Table"?
by MidLifeXis (Monsignor) on Apr 07, 2015 at 20:58 UTC

    Not sure if there is an automatic one, but perhaps the CSV DBI module could help in the translation. OTOH, fixed width columns are not that tough to translate.

    --MidLifeXis

Re: Is there a "Here Table"?
by erix (Prior) on Apr 07, 2015 at 21:07 UTC

    If I remember correctly DBD::AnyData can read that format.

    I'll give it a try tomorrow if no one else has.

    update: DBD::AnyData failed to install with either 5.21.11 or 5.20.2, so I'll put no more effort in this.

Re: Is there a "Here Table"?
by LanX (Saint) on Apr 08, 2015 at 10:38 UTC
    Let me take a more abstract approach to your question:

    First, it's not clear if your question solely concentrates on DB table dumps.

    Then a module capable of parsing human readable tables wouldn't be restricted to here docs, but capable to parse any string or filehandle.

    Additionally the choice of resulting data format is not obvious, it depends on the use case.

    Eg see Re: Building data structure from multi-row/column table

    There are plenty of small snippets in the monastery demonstrating how to parse such data.

    I wouldn't know how to design a generic module which can be customized to handle all cases without requiring coding. (Taking into consideration that csv is an edge case of a table format.)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!