Gangabass has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks

I have some data with fixed columns width:

<S> <C> <C> <C> <C +> <C> <C> <C> <C> <C> <C> <C> ACACIA RESH CORP ACACIA TCH COM 003881307 110,725 1 +,875,000 SHS SHARED-OTHER 1,2,3 1,875,000 0 0

I need to parse this data. I think the best way is to use unpack() for this (may be i'm wrong?). So i need to know position (in the line) of each column.

How i can find it?

Thanks!

Replies are listed 'Best First'.
Re: How to find start position of each column
by Corion (Patriarch) on May 28, 2008 at 12:09 UTC

    What's wrong with just counting the columns? Or are the column widths always changing? Then you will need to tell us how to tell one column from another.

    Anyway, BrowserUk wrote Re: Fixed Position Column Records, which also creates the unpack template for you, which should solve your problem. Also see EvanCarrol's DataExtract::FixedWidth, which employs that code and wraps it with some more infrastructure.

Re: How to find start position of each column
by almut (Canon) on May 28, 2008 at 12:43 UTC
    So i need to know position (in the line) of each column.

    There are various ways to understand your question. For example, it could mean that there are column markers like "<C>" (literally) in the first/header row, and you want to find where they occur on that line.  In that case, you could do something like

    my $firstrow = "<S> <C> <C> + <C> <C> <C> <C> <C> <C> <C> <C +> <C>"; my @columnpos; while ($firstrow =~ /(<\w>)/g) { push @columnpos, pos($firstrow) - length($1); } print "positions: @columnpos\n"; __END__ positions: 0 30 47 57 68 78 82 88 101 111 121 128
      while ($firstrow =~ /(<\w>)/g) { push @columnpos, pos($firstrow) - length($1); }

      You could use a look-ahead to avoid doing the capture, length and subtraction.

      push @columnpos, pos $firstrow while $firstrow =~ m{(?=<\w>)}g;

      I hope this is of interest.

      Cheers,

      JohnGG

Re: How to find start position of each column
by moritz (Cardinal) on May 28, 2008 at 12:10 UTC
    You can fill a long string with whitespaces and binary-AND all the lines from your file with it. The places where a whitespace remains is likely a column separator.
Re: How to find start position of each column
by apl (Monsignor) on May 28, 2008 at 12:09 UTC
    Edit the data file, and arrow-over to the start of each field. Count out loud while you do so.

      Bah, count? I thought that was why we had computers . . .

      perl -le 'print join ( "", map { " " x 9 . $_ } 1..8 ), "\n", "1234567 +890" x 8'

      Update: Fore (and putting it to use :) . . .

      1 2 3 4 5 6 7 + 8 1234567890123456789012345678901234567890123456789012345678901234567890 +1234567890 perl -le'print join("",map{" "x9 .$_}1..8),"\n","1234567890"x8' ruby -e'puts (?1..?8).map{|x|" "*9<<x}.join+"\n"+"1234567890"*8'

      Season to taste if you want 0-based or a longer ruler.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        Heh, it's been done before.

        • another intruder with the mooring in the heart of the Perl

Re: How to find start position of each column
by monarch (Priest) on May 28, 2008 at 12:21 UTC
    Doing it the hard way:

    This produces the following output:

    "ACACIA RESH CORP ","ACACIA TCH COM ","003881307 "," 11 +0,725 "," 1,875,000"," SHS"," "," SHARED-OTHER"," 1,2,3 "," + 1,875,000"," 0 "," 0"