cscu2007 has asked for the wisdom of the Perl Monks concerning the following question:

-------------------------------------------------------------------------------------------

Name       Company                Work                       Home                       Work

                  Name                     Address                   Phone                       Hours

                                                                                Number

-------------------------------------------------------------------------------------------


John       ABC                          MAIN_STREET         987.111.4567            8

Bob         CNN                         2nd_Street                875.444.3425            2


I can build a hash table for the contents, but trouble is with the header. I need to find out how to get the column number based on above header, to extract the corresponding values per column. The reason to do this, because the header is dynamic, including orders, but for the same column, the header is always formated as aligned (first letter across multiple rows, as above).


For example, I need to find out which column of "Work Address" from header, then I can grep the corresponding column from the table contents.


Thanks in advance for the help!

  • Comment on finding colunm header which crossing multiple rows

Replies are listed 'Best First'.
Re: finding colunm header which crossing multiple rows
by tybalt89 (Monsignor) on Nov 16, 2018 at 20:07 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1225917 use strict; use warnings; $_ = <DATA>; my @titles = /\w+/g; my @positions; push @positions, $-[0] while /\w+/g; while( <DATA> ) { /\S/ or last; for my $i (0 .. $#positions) { /^.{$positions[$i]}(\w+)/ and $titles[$i] .= " $1"; } } use Data::Dump 'dd'; dd \@titles; __DATA__ Name Company Work Home + Work Name Address Phone + Hours Number

    Outputs:

    [ "Name", "Company Name", "Work Address", "Home Phone Number", "Work Hours", ]

      Or

      $_ = <DATA>; my @titles = split; my @lens = map length, /\S+\s*/g; $lens[-1] = '*'; my $pat = join "", map "A".$_, @lens; while (defined( $_ = <DATA> ) && /\S/) { my @more = unpack($pat, $_); for my $i (0..$#titles) { $titles[$i] .= ' '.$more[$i] if length($more[$i]); } }

      This can be reduced to the following:

      $_ = <DATA>; my @titles = split; my @lens = map length, /\S+\s*/g; $lens[-1] = '*'; my $pat = join "", map "A".$_, @lens; while (defined( $_ = <DATA> ) && /\S/) { my @more = unpack(" ".$pat, $_); $titles[$_] .= $more[$_] for 0..$#titles; }

        Thank you ikegami! I got the idea to match up string position and concatenate to the final column names

      This is Awesome. Thanks tybalt89!

Re: finding colunm header which crossing multiple rows
by LanX (Saint) on Nov 16, 2018 at 19:54 UTC
    In order to help you, you should post your input text inside code tags <c>...</c>.

    A general answer is to match a regex against the first headline and to count the length of the matches, like that you'll get the width of the columns.

    update

    DB<17> $_='Name Company Work + Home Work' DB<18> push @col,$1 while /(\w+\s*)/g DB<19> x @col 0 'Name ' 1 'Company ' 2 'Work ' 3 'Home ' 4 'Work' DB<20> print map {":".length $_} @col :11:23:27:27:4

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Thanks LanX.
Re: finding colunm header which crossing multiple rows
by BillKSmith (Monsignor) on Nov 17, 2018 at 16:49 UTC
    Now that you have solved the header problem, you still have to write your 'flat file database' program. I recommend that you at least look at that section in the book "Higher Order Perl" (ISBN 1558607013). I believe that a free download is still available.
    Bill
      "I believe that a free download is still available."

      There's a variety of downloading options linked from: Higher-Order Perl.

      — Ken