Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks,

I have a very long tab separated list that looks something like this:

Apples__0.5__-10__Emma:17:15:14:18__Peter:2:7:4:1(Newline)
Pears__0.7__-12__Alex:101:144:110:111(Newline)
Oranges__0.8__-14__Shan:12:14:9:57__Heena:65:17:15:24 __Rachel:1:5:18:54(Newline)

The double underspaces represent tabs. Sorry if that comes out poorly here.

The point is that each line will have a variable amount of columns because the data from columns 3 2 onwards is variable.

I am trying to parse the list so each column after 3 has its own entry, but retaining the information that went beforehand for example:

Apples__0.5__-10__Emma:17:15:14:18(Newline)
Apples__0.5__-10__Peter:2:7:4:1(Newline)
Pears__0.7__-12__Alex:101:144:110:111(Newline)
Oranges__0.8__-14__Shan:12:14:9:57(Newline)
Oranges__0.8__-14__Heena:65:17:15:24(Newline)
Oranges__0.8__-14__Rachel:1:5:18:54(Newline)

I hope I have explained this clearly.

I tihnk this should be easy in perl but as a new perl user I am finding it difficult. My experiende is ,imited to fiddling with other already existing scripts, not writing them from scratch. (I have started doing this in excel manually but it's extremely tedious and the file is huge.)

Any help will be much appreciated.
  • Comment on Parsing a table with variable columns on each line

Replies are listed 'Best First'.
Re: Parsing a table with variable columns on each line
by ikegami (Patriarch) on Oct 23, 2009 at 15:25 UTC
    while (<>) { chomp; my @fields = split /\t/; for ( splice(@fields, 3) ) { print(join("\t", @fields, $_), "\n"); } }
    or
    while (<>) { (my $common, $_) = /^((?:[^\t]*\t){3})(.*)/; for ( split /\t/ ) { print("$common\t$_\n"); } }
      Thank you Ikegami,
      The outputs aren't quite what I had in mind but I couldn't really represent my data accurately. I think now with a starting point I can fiddle about and get somewhere with this file.

      Thank you for you time and wisdom.
      Have a nice weekend.

        The outputs aren't quite what I had in mind

        I don't understand. The output is exactly what you requested.

        Apples 0.5 -10 Emma:17:15:14:18 Apples 0.5 -10 Peter:2:7:4:1 Pears 0.7 -12 Alex:101:144:110:111 Oranges 0.8 -14 Shan:12:14:9:57 Oranges 0.8 -14 Heena:65:17:15:24 Oranges 0.8 -14 Rachel:1:5:18:54

        ...except for the second snippet. It had a bug that's now fixed.

Re: Parsing a table with variable columns on each line
by ww (Archbishop) on Oct 23, 2009 at 15:31 UTC

    "Sorry if that comes out poorly here.

    It does. You need to read Markup in the Monastery and Perl Monks Approved HTML tags -- specifically, about the use of <c>...</c> which is also BOLDFACED in the cautionary note below the text input box where you create a node.. Then you can make your data render reasonably (and make it downloadable for those with time to help on this).

    Hint only for now: read perldoc -f split (aka: split) re limits.:

    Apples 0.5 -10 Emma:17:15:14:18 Peter:2:7:4:1 Pears 0.7 -12 Alex:101:144:110:111 Oranges 0.8 -14 Shan:12:14:9:57 Heena:65:17:15:24 Rach +el:1:5:18:54
    and the desired outcome will look like this:
    Apples 0.5 -10 Emma:17:15:14:18 Apples 0.5 -10 Peter:2:7:4:1 Pears 0.7 -12 Alex:101:144:110:111 Oranges 0.8 -14 Shan:12:14:9:57 Oranges 0.8 -14 Heena:65:17:15:24 Oranges 0.8 -14 Rachel:1:5:18:54
      Thanks ww,
      I did see the markup guidelines, just wasn't sure where the code for tab was. Oops. Still not.

      I'll read the perldoc. Thanks for the reply and accurate representation of my problem.

      Have a nice weekend.

        No special code required to have your tabs embedded; just cut'n paste from your tabified content.

        Code tags are sorta' like really smart <pre> tags, on steroids. (Don't use pre!)

        Although a casual scan of a few posts might mislead, markup here is NOT precisely .html; code is an instance.

Re: Parsing a table with variable columns on each line
by bichonfrise74 (Vicar) on Oct 23, 2009 at 19:28 UTC
    Here's another way to solve your problem.
    #!/usr/bin/perl use strict; while( <DATA> ) { chomp; my ($name, $val1, $val2, @val3) = split ( "\t" ); print join( "\t", ($name, $val1, $val2, $val3[$_]) ) , "\n" for (0 .. $#val3); } __DATA__ Apples 0.5 -10 Emma:17:15:14:18 Peter:2:7:4:1(Newline) Pears 0.7 -12 Alex:101:144:110:111(Newline) Oranges__0.8 -14 Shan:12:14:9:57 Heena:65:17:15:24 Rachel +:1:5:18:54(Newline)
      Now available with more punctuation variables:

      #!/usr/bin/perl -l use strict; while( <> ) { chomp; local $, = "\t"; my ($name, $val1, $val2, @val3) = split ( "\t" ); print $name, $val1, $val2, $_ for @val3; }

      Update: Edited because I apparently didn't understand the original question. The point of setting $, instead of using join remains.
        Thank you. Job is done now.