Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

How do I use the map command for this?

by Anonymous Monk
on Jun 19, 2022 at 13:01 UTC ( [id://11144847]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I need to do the following:
In a document with the following format:
>id1|various annotation1 sequence of characters1 >id2|various annotation2 sequence of characters2 ... >idN|various annotationN sequence of charactersN
I want to group the lines that have the same various annotation and sequence but not the same id into one line, like this:
>id1-id15-id22|various annotation1 sequence of characters1 ...
When the various annotation is the same, then the sequence is also the same.
I am pretty sure I need to use the map command, but I am a bit unsure how I would handle this. So far I have used it when I have 2 things to compare, e.g. only the id that would change and the sequence that could be repeated, and I was grouping based on the sequence. Like this:
use strict; use warnings; my %res; while (<>) { chomp; my ( $name, $rest ) = split /\t/; push @{ $res{$name} }, $rest; } for ( sort keys %res ) { print "$_:", join( ",", @{ $res{$_} } ); print "\n"; }

But now, I have 3 things instead.

Replies are listed 'Best First'.
Re: How do I use the map command for this?
by haukex (Archbishop) on Jun 19, 2022 at 15:41 UTC

    It looks like you've got a FASTA format file. There are lots of modules on the CPAN to handle that file format, for example the BioPerl suite is frequently recommended. There are various nodes on this site that show the use of that module.

    Other than that, as has already been said, please see How do I post a question effectively? and Short, Self-Contained, Correct Example - in this case, your question is missing representative sample input and the expected output for that input, which means that anyone wishing to help you has to make some up, and that may not be representative of your actual input. Update: I'm going to assume that this node shows representative input, despite the fact that it doesn't compile.

Re: How do I use the map command for this?
by kcott (Archbishop) on Jun 20, 2022 at 08:32 UTC
    "How do I use the map command for this?"

    No idea. :-)

    "I am pretty sure I need to use the map command"

    Why?

    In the code below, I simplified the data so that it's easier to see what's happening. I've also jumbled it up to demonstrate the two sorts — you may not need or want either of these (biological data can be huge, so don't do any sorting unless it's absolutely necessary).

    #!/usr/bin/env perl use strict; use warnings; my %res; { local $/ = "\n>"; while (my $record = <DATA>) { chomp $record; $record = substr $record, 1 if $. == 1; $record =~ s/\s*\z//m; my ($id, $anno_seq) = split /\|/, $record, 2; push @{$res{$anno_seq}}, $id; } } print '>', join('|', (sort @{$res{$_}}), $_), "\n" for sort { $res{$a}[0] cmp $res{$b}[0] } keys %res; __DATA__ >id7|anno2 GHI >id1|anno1 ABC >id6|anno1 ABC >id4|anno2 GHI >id2|anno3 DEF >id3|anno1 ABC >id5|anno1 ABC

    Output:

    >id1|id3|id5|id6|anno1 ABC >id2|anno3 DEF >id4|id7|anno2 GHI

    — Ken

Re: How do I use the map command for this?
by tybalt89 (Monsignor) on Jun 19, 2022 at 17:13 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11144847 use warnings; my %res; /(id\d+)(.*)/s and push @{ $res{$2} }, $1 for split />/, join '', <DAT +A>; for ( sort keys %res ) { local $" = '|'; print ">@{$res{$_}}$_"; } __DATA__ >id1|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id2|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id4|Q51487|P-474-4|86-98,113-126,297-310,322-335 alt CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id5|Q51487|P-474-4|86-98,113-126,297-310,322-335 alt CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE

    Outputs:

    >id1|id2|id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id4|id5|Q51487|P-474-4|86-98,113-126,297-310,322-335 alt CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALEc>
Re: How do I use the map command for this?
by LanX (Saint) on Jun 19, 2022 at 13:33 UTC
    Sorry, I have problems to understand your question ...

    Suggestions:

    • get an account here, so you can edit and fix your postings
    • provide us with an SSCCE with clear input and expected output

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Yes, probably it was not well-written. Imagine this script:
      use strict; use warnings; my %res; my $id=''; my $rest=''; my $seq=''; while (<DATA>) { chomp; if($_=~/^>(.*?)\|(.*)/) { $id=$1; $rest=$2; $seq=<DATA>; chomp $seq; } } _DATA_ >id1|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id2|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE

      and I want to go to:
      >id1|id2|id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE

      And I wonder if I can use some of Perl's data structures, or the map command?

        As usual, TIMTOWTDI. However, keeping your initial code structure for better or worse:

        use strict; use warnings; use Test::More tests => 1; my $want = <<EOT; >id1|id2|id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE EOT my %res; while (<DATA>) { if (/^>(.*?)\|(.*)/s) { my $id = $1; my $rest = $2; if (exists $res{$rest}) { $res{$rest}{keys} .= "|$id"; } else { my $seq = <DATA>; $res{$rest} = { keys => ">$id", seq => $seq, } } } } my ($k) = keys %res; # You would loop over all keys is your real prog my $have = "$res{$k}{keys}|$k$res{$k}{seq}"; is $have, $want; __DATA__ >id1|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id2|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE >id3|Q51487|P-474-4|86-98,113-126,297-310,322-335 CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE

        🦛

        > _DATA_

        I can tell immediately that this doesn't run! 👎

        You should check before posting, that's the "Correct" part in SSCCE!

        And it's not clear to me which part has to be unique...

        This

        |Q51487|P-474-4|86-98,113-126,297-310,322-335

        or this

        CSLIPDYQRPEAPVAAAYPQGQAYGQNTGAAAVPAADIGWREFFRDPQLQQLIGVALE

        or both.

        This has not much to do with map, a HoA = Hash of Arrays with unique keys (well which one?) is enough.

        Or probably a HoHoA depending on your perception of unique.

        And of course also depending on your desired output order.

        Please show more effort!

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        I had this older script that was doing something similar, but with two values separated by tab:
        use strict; use warnings; my %res; while (<>) { chomp; my ( $name, $rest ) = split /\t/; push @{ $res{$name} }, $rest; } for ( sort keys %res ) { print "$_:", join( ",", @{ $res{$_} } ); print "\n"; }

        but I do not know how to adapt it.
Re: How do I use the map command for this?
by Marshall (Canon) on Jun 20, 2022 at 23:15 UTC
    "How do I use the map command for this?"
    "I am pretty sure I need to use the map command"

    The map statement is just a short-hand way of writing a simple "for" loop.
    ALL map's can be expressed as "for" (foreach) loops.
    Perl generates the same code for a "map" as the equivalent "foreach" loop.

    map{} is not your issue.

    I think looking at Bio::Perl will lead to some suggestions for you. There are many routines optimized for the FASTA format.

      Perl generates the same code for a "map" as the equivalent "foreach" loop.

      wrong

        Would certainly enjoy being educated. I probably should have said "similar". A map is a looping statement.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11144847]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-04-23 09:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found