in reply to Re^3: First foray into Perl
in thread First foray into Perl

The next line is:

TF Name Unknown/

So far I've tried a do-until loop:

foreach (1..25) { my $command do { STUFF TO EXECUTE } until ($command eq "TF"); }

Any pointers (in the literary sense) gratefully received! Cheers

Replies are listed 'Best First'.
Re^5: First foray into Perl
by LostWeekender (Novice) on Mar 25, 2014 at 17:23 UTC

    .. bit stuck. Here's what I have so far. Seems close but I can't quite get it right. Please help!

    use strict; use warnings; open(MOTIFS, "all_motifs.txt") or die("Unable to open file"); # Read the first 7 lines of metadata. # # Assuming there are always 7 lines of metadata. foreach (1..121) { foreach (1..7) { # Read a line of data. my $header_data = <MOTIFS>; # Remove the end of line character. chomp $header_data; # Split the string into 2 parts, using white space as a separator +. my ($lable, $string) = split /\s+/, $header_data, 2; # only pay attention to the "Motif" line. next if ($lable ne 'Motif'); print "$string "; } # Process the next lines of data until line containing string "TF + Unknown" is reached. foreach (<MOTIFS>) { # Remove the end of line character. chomp my $line; # Process lines until "TF Unknowm' while ($line ne 'TF Unknown') { # Declare a variable to hold the data in the file. my %base_pairs; # Split the string into 5 parts, using whitespace as a separ +ator. # Assuming the Position is always in the same order in the f +ile. (undef, $base_pairs{A}, $base_pairs{C}, $base_pairs{G}, $bas +e_pairs{T}) = split /\s+/, $line, 5; my @letters = keys %base_pairs; # Start with the first column value and make it the max. val +ue. my $max = pop @letters; # Compare each value to the maximum. foreach my $letter (@letters) { # What if two (or more) values are equal??? if ($base_pairs{$max} < $base_pairs{$letter}) { # The current value was greater than the maximum, so +make it the new maximum. $max = $letter; } } # Print the letter representing the maximum value. print $max; } } } # print an end of line character. print "\n";

    and this is what a few records look like:

    TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|1x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.664794 0.13099 0.0810125 0.123203 2 0.675621 0.0396475 0.144967 0.139764 3 0.0913393 0.0396819 0.847004 0.0219745 4 0.850414 0.0522149 0.0519174 0.0454536 5 0.89157 0.00962148 0.0845269 0.0142814 6 0.122389 0.0875591 0.0734604 0.716591 7 0.226696 0.00745549 0.745549 0.0202999 8 0.156228 0.151994 0.128767 0.563011 9 0.22083 0.561173 0.12007 0.0979266 10 0.507656 0.0711684 0.0652815 0.355894 TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|2x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.538498 0.157305 0.157633 0.146564 2 0.0728444 0.00877167 0.877166 0.0412175 3 0.959269 0.0131077 0.0159611 0.0116621 4 0.852439 0.0238831 0.0168134 0.106864 5 0.57332 0.0688014 0.181385 0.176494 6 0.139513 0.0747988 0.737607 0.0480813 7 0.735484 0.0912993 0.09091 0.0823067 8 0.79932 0.0270417 0.137306 0.0363319 9 0.16103 0.12536 0.109938 0.603672 10 0.622356 0.06782 0.115463 0.194361 TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|3x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.616484 0.0886488 0.24602 0.0488468 2 0.0971289 0.591289 0.134781 0.176801 3 0.0715039 0.0237142 0.0432674 0.861514 4 0.73769 0.117011 0.059703 0.0855963 5 0.0728444 0.00877167 0.877166 0.0412175 6 0.959269 0.0131077 0.0159611 0.0116621 7 0.852439 0.0238831 0.0168134 0.106864 8 0.57332 0.0688014 0.181385 0.176494 9 0.139513 0.0747988 0.737607 0.0480813 10 0.615257 0.189034 0.125514 0.0701943 TF Unknown TF Name Unknown Gene ENSG00000113916

    massive thanks!

      ... this is what a few records look like ...

      It would greatly help anyone who is trying to help you if you would also provide the exact output you expect from the example input. (You can just add an update – properly cited as such – to your post; no need for a separate node.)

      Note: The single-tab (\t) field separators used in the example data posted with your OP seem to have been entirely replaced with multiple-space (\x20) character separators in the example data included with your latest post. Was this intentional or inadvertent? I'm working on some code that (I think) should work with either field separation scheme, but it would be good to know just how record fields will be separated.

      Another Note: I just noticed that the latest example data here end with the following three lines:

      TF Unknown TF Name Unknown Gene ENSG00000113916
      These look like the start of another record and screw up parsing. Is this the intended and necessary ending of a full set of data records? It's a problem if so. (The absence of an unambiguous multi-line record delimiter is another problem, but can be finessed if necessary.)

      Also: Any word yet on the tab-versus-spaces field delimiter question posted above?

        Hi, thank you very much for looking at this.

        The spacing change was inadvertent, not quite sure what happened there.

        The last three lines are the start of another record and can be ignored/removed. I do have another version of the record file with records delimited by a double space, thusly:

        TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|4x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.427379 0.0647991 0.288826 0.218996 2 0.201974 0.139791 0.35254 0.305695 3 0.11714 0.118042 0.143884 0.620934 4 0.637331 0.0996546 0.228428 0.0345867 5 0.0971289 0.591289 0.134781 0.176801 6 0.0715039 0.0237142 0.0432674 0.861514 7 0.73769 0.117011 0.059703 0.0855963 8 0.0728444 0.00877167 0.877166 0.0412175 9 0.959269 0.0131077 0.0159611 0.0116621 10 0.612865 0.057845 0.0583267 0.270963 TF Unknown TF Name Unknown Gene ENSG00000161940 Motif ENSG00000161940___1|1x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.614704 0.122914 0.125116 0.137266 2 0.0954267 0.010422 0.851317 0.0428343 3 0.959146 0.00959146 0.0112618 0.0200008 4 0.91149 0.0146678 0.0135794 0.0602625 5 0.67464 0.0388388 0.13716 0.149361 6 0.104655 0.0579394 0.804166 0.0332392 7 0.789171 0.102902 0.0490883 0.0588389 8 0.776513 0.0273768 0.144501 0.0516094 9 0.130657 0.06051 0.0793659 0.729467 10 0.626753 0.0648533 0.143976 0.164418

        Thanks again!