Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I've a file like this (values are tab seperated);
A A_
B B_
C C_
D D_
E E_
A A*
B B*
C C*
D D*
E E*
A A-
B B-
C C-
D D-
E E-
A AB
B BB
C CC
D DD
While knowing the line position/number of key value'A'
how can i transform into a file so that my output look like;
A A_ A- AB
B B_ B- BB
C C_ C- CC
... so on
Many thanks,
Syed.
  • Comment on how to place rows information into column?

Replies are listed 'Best First'.
Re: how to place rows information into column?
by holli (Abbot) on Feb 01, 2005 at 17:11 UTC
    like so?
    use strict; use warnings; my %hash; while (<DATA>) { if ( /(\w) (.+)/ ) { $hash{$1} .= $2; } } for ( sort keys %hash ) { print "$_ $hash{$_}\n"; } __DATA__ A A_ B B_ C C_ D D_ E E_ A A* B B* C C* D D* E E* A A- B B- C C- D D- E E- A AB B BB C CC D DD
    prints:
    A A_A*A-AB B B_B*B-BB C C_C*C-CC D D_D*D-DD E E_E*E-

    holli, regexed monk
Re: how to place rows information into column?
by TedYoung (Deacon) on Feb 01, 2005 at 17:06 UTC

    What happened to A*, B*, C*, etc?

    Perhaps you want something like this:

    # Use strict and warnings, etc # Read the file and hash it my %data; while (<>) { my ($k, $v) = split /\t/; # Split on tabs my $values= $data{$k} ||= []; # ensure we have an array ref push @$values, $v; # add the value to the end of list for this key } # Print out the results for (sort keys %file) { # Sort is optional print join "\t", ($_, @{ $data{$_} }); # Assuming you want them tab +seperated print "\n"; }

    Code is untested...

    Ted Young

    ($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)
Re: how to place rows information into column?
by jZed (Prior) on Feb 01, 2005 at 17:15 UTC
    Something like:
    #!perl -w use strict; my %hash; for my $line(<DATA>) { chomp($line); my($key,$val) = split /\t/, $line; next unless defined $key and defined $val; $hash{$key} .= " $val"; } print $hash{$_} . "\n" for keys %hash; __DATA__ A A_ B B_ C C_ D D_ E E_ A A* B B* C C* D D* E E* A A- B B- C C- D D- E E- A AB B BB C CC D DD
how to place rows information into column (2)
by riz (Initiate) on Feb 02, 2005 at 20:37 UTC
    Dear Perl Monks,
    My question is the continuation of old post "how to place rows information into column" answered by Ted, Holli & jZed, Thanks a lot for your help.
    The sample file i posted earlier is different from actual one. I just wanted to get a hint from you guys and learn things to apply on my own but later when i tried to implement on my real file, i couldn't find a way.
    Actually i am unable to create a matching pattern statement for my hash. Please help me here with the real file which looks like:

    0.000000e+000 1.502947e+000
    1.162272e-012 1.508957e+001
    2.324544e-012 1.508948e+000
    .....
    0.000000e+000 1.502947e+001
    1.162272e-012 1.508941e+001
    2.324544e-012 1.508940e+000
    ....
    0.000000e+000 1.503947e+000
    1.162272e-012 1.504947e+000
    2.324544e-012 1.508900e+001
    ..... so on

    Its a 300MB file, the two values are tab seperated. The first column value is repeating after lets say 2000 lines. Their pair value varies (could be the same)

    I would appreciate if you write a precise comments especially on matching statement

    Thanks in advance.
    Syed.
      If your data is "tab delimited" and you have 300mb of it, you can use Text::CSV_XS, the fastest method of parsing "delimited" data (which is really "separated" data). Just start the module with sep_char="\t" and it will handle your data fine.
      $file = '0.000000e+000 1.502947e+000 1.162272e-012 1.508957e+001 2.324544e-012 1.508948e+000'; my $regex = { 'first' => qr{^(\S+)}, # anything non-blank from the beginning '^' 'second' => qr{\s+(\S*)} # anything non-blank after the fist blank '\ +s+' }; for(split '\n', $file){ /$regex->{first}$regex->{second}/o # 'o' compiles the regex and # and upon success we take the first column and/or the second one print "Key: $1 and Value: $2\n"; # So hash structure can have $hash{$1} = $2 }
        Hi sh1n,

        are we assigning these lines
        '0.000000e+000 1.502947e+000
        1.162272e-012 1.508957e+001
        2.324544e-012 1.508948e+000';
        to the variable $file. With '...' in the middle means some 2000 more pairs in continuity. Kindly explain your very first line of the code.
        many thanks,
        riz.

      I wouldn't try to pattern match the scientific float... you could, but I think it's over-kill. If the data is tab delimited, then you can use that to distinguish the two values:

      use strict; while(<DATA>) { # Get rid of trailing \n on line. chomp; # divide the line into two items based on tab my ($x, $y) = split("\t"); print "X:[$x] Y: [$y]\n"; } ##OUTPUT: # X:[0.000000e+000] Y: [1.502947e+000] # X:[1.162272e-012] Y: [1.508957e+001] # X:[2.324544e-012] Y: [1.508948e+000] # X:[0.000000e+000] Y: [1.502947e+001] # X:[1.162272e-012] Y: [1.508941e+001] # X:[2.324544e-012] Y: [1.508940e+000] # X:[0.000000e+000] Y: [1.503947e+000] # X:[1.162272e-012] Y: [1.504947e+000] # X:[2.324544e-012] Y: [1.508900e+001] ##NOTE in data the two fields are tab delimited. __DATA__ 0.000000e+000 1.502947e+000 1.162272e-012 1.508957e+001 2.324544e-012 1.508948e+000 0.000000e+000 1.502947e+001 1.162272e-012 1.508941e+001 2.324544e-012 1.508940e+000 0.000000e+000 1.503947e+000 1.162272e-012 1.504947e+000 2.324544e-012 1.508900e+001

      "Look, Shiny Things!" is not a better business strategy than compatibility and reuse.

        wouldn't try to pattern match the scientific float... you could, but I think it's over-kill.

        Do you really have any idea what does  /^(\S+)\s+(\S*)/o do?

        /^(\S+)\s+(\S*)/o and print FH "K: ", $1, "V: ", $2, "\n" while <DATA> +;


        where <DATA> contains over 22000 lines - less than a second
        on my old (less than 1800 Mh cpu) home machine.
Re: how to place rows information into column?
by Anonymous Monk on Feb 02, 2005 at 13:56 UTC
    Ted, Holli & jZed,

    Thanks a lot for your help. I am bothering you one more time, sorry!
    The sample file i posted earlier is different from actual one.
    I just wanted to get a hint from you guys and learn things to apply on my own but later when i tried to implement on my real file, i couldn't find a way.

    Actually i am unable to create a matching pattern statement
    for my hash. Please help me here with the real file which
    looks like:

    0.000000e+000 1.502947e+000
    1.162272e-012 1.508957e+001
    2.324544e-012 1.508948e+000
    .....
    0.000000e+000 1.502947e+001
    1.162272e-012 1.508941e+001
    2.324544e-012 1.508940e+000
    ....
    0.000000e+000 1.503947e+000
    1.162272e-012 1.504947e+000
    2.324544e-012 1.508900e+001
    ..... so on

    Its a 300MB file, the two values are tab seperated. The first column value is repeating after lets say 2000 lines. Their pair value varies (could be the same)
    I would appreciate if you write a precise comments especially on matching statement

    Thanks in advance.
    Syed.