aeqr has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I would like to know if there is a shortest way to do this:
my @seq1; my @seq2; load_seqs(\@seq1,\@seq2); sub load_seqs{ open(my $fh, "<", "q1.fa") or die "cannot open sequence file: $!"; while(my $line=<$fh>){ if($line =~ m/^(?!\>).*\n$/){push (@{$_[0]},$line)}; } open(my $fh, "<", "q2.fa") or die "cannot open sequence file: $!"; while(my $line=<$fh>){ if($line =~ m/^(?!\>).*\n$/){push (@{$_[1]},$line)}; } }
I have tried the diamond operator but I am not sure on how to use it in that case. Thanks!

Replies are listed 'Best First'.
Re: use of diamond operator for multiple files
by davido (Cardinal) on Apr 23, 2014 at 17:59 UTC

    You are asking for the shortest amount of code needed to gather into two arrays from two files all lines that don't start with a > character, and that do end in a newline character? So if the last line in the file doesn't have a newline character at the end, should it be ignored?

    Does the code you posted do what you want in the first place? Whether optimizing for computational time, code length, memory, or aesthetic beauty, first the problem must be well defined, second the algorithm must work, third, the goal of the optimization must be known. None of those criteria have been made clear to us.


    Dave

      Hey, thanks for the answer.

      Yes the code works, but I wanted to see other solutions. I would say it is to optimize code length.

      Yes I am asking exactly this: The shortest amount of code needed to gather into two arrays from two files all lines that don't start with a > character, and that do end in a newline character.

      Thank you!

        This isn't what I would consider a golfy solution, but it is more brief:

        my @data; for my $file (qw(this.txt that.txt)) { open my $fh, '<', $file or die $!; push @data, []; while( <$fh> ) { push @{$data[-1]}, $_ if /\n\z/ && ! /^>/; } }

        If you want that in the form of a subroutine:

        @data = read_files( qw(this.txt that.txt) ); sub read_files { my @data; for my $file ( @_ ) { open my $fh, '<', $file or die $!; push @data, []; while( <$fh> ) { push @{$data[-1]}, $_ if /\n\z/ && ! /^>/; } return @data; }

        (Untested)


        Dave

Re: use of diamond operator for multiple files
by Not_a_Number (Prior) on Apr 23, 2014 at 20:16 UTC

    If I understand correctly, you want a separate array to be created for each input file. The following code creates a hash of array references, with the key being the file name and the value a reference to an array containing all the lines that do not begin with '>':

    my %results; { local @ARGV = ( 'q1.fa', 'q2.fa' ); /^>/ or push @{ $results{ $ARGV } }, $_ while <>; }

    To access (ie print) this HoA, you could do, for example:

    for ( keys %results ) { print "$_:\n"; print @{ $results{ $_ } }; print "\n"; }
      Thanks for the help, I think I see how to come up with a new solution.
Re: use of diamond operator for multiple files
by Laurent_R (Canon) on Apr 23, 2014 at 18:48 UTC
    If your code is correct, you could reduce it by trying this (untested):
    sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(my $line = <>){ if($line =~ m/^(?!\>).*\n$/){push (@{$_[0]},$line)}; } }
    or even slightly more concise:
    sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(<>) { push @{$_[0]}, $_ if m/^(?!\>).*\n$/; } }

    Edit: Ooops, sorry for duplicate posting, I don't know what I did wrong.

    Edit 2: I did not pay attention to the fact that you seem to want to split your data in separate arrays, I thought that you wanted all your data in the same array. I only noticed it after reading Not_a_Number's post below. Well, then, my solution above is not adequate. Sorry.

Re: use of diamond operator for multiple files
by hdb (Monsignor) on Apr 24, 2014 at 12:16 UTC

    Two comments from my side. First, your subroutine does the same thing twice. I would change it such that it reads the filtered contents of a given file into a given array and call it twice. Second, your while loop is essential equivalent to grep, so in summary, I would rewrite your code as:

    my @seq1 = load_seqs( "q1.fa" ); my @seq2 = load_seqs( "q2.fa" ); sub load_seqs{ open my $fh, "<", shift or die "cannot open sequence file: $!"; return grep { /^(?!\>).*\n$/ } <$fh>; }
Re: use of diamond operator for multiple files
by Laurent_R (Canon) on Apr 23, 2014 at 18:42 UTC
    If your code is correct, you could reduce it by trying this (untested):
    sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(my $line = <>){ if($line =~ m/^(?!\>).*\n$/){push (@{$_[0]},$line)}; } }