use of diamond operator for multiple files

aeqr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: use of diamond operator for multiple files by davido (Cardinal) on Apr 23, 2014 at 17:59 UTC
You are asking for the shortest amount of code needed to gather into two arrays from two files all lines that don't start with a `>` character, and that do end in a newline character? So if the last line in the file doesn't have a newline character at the end, should it be ignored? Does the code you posted do what you want in the first place? Whether optimizing for computational time, code length, memory, or aesthetic beauty, first the problem must be well defined, second the algorithm must work, third, the goal of the optimization must be known. None of those criteria have been made clear to us. Dave	[reply] [d/l]
Re^2: use of diamond operator for multiple files by aeqr (Novice) on Apr 23, 2014 at 18:20 UTC
Hey, thanks for the answer. Yes the code works, but I wanted to see other solutions. I would say it is to optimize code length. Yes I am asking exactly this: The shortest amount of code needed to gather into two arrays from two files all lines that don't start with a > character, and that do end in a newline character. Thank you!	[reply]
Re^3: use of diamond operator for multiple files by davido (Cardinal) on Apr 23, 2014 at 18:39 UTC
This isn't what I would consider a golfy solution, but it is more brief: `my @data; for my $file (qw(this.txt that.txt)) { open my $fh, '<', $file or die $!; push @data, []; while( <$fh> ) { push @{$data[-1]}, $_ if /\n\z/ && ! /^>/; } }` [download] If you want that in the form of a subroutine: `@data = read_files( qw(this.txt that.txt) ); sub read_files { my @data; for my $file ( @_ ) { open my $fh, '<', $file or die $!; push @data, []; while( <$fh> ) { push @{$data[-1]}, $_ if /\n\z/ && ! /^>/; } return @data; }` [download] (Untested) Dave	[reply] [d/l] [select]
Re: use of diamond operator for multiple files by Not_a_Number (Prior) on Apr 23, 2014 at 20:16 UTC
If I understand correctly, you want a separate array to be created for each input file. The following code creates a hash of array references, with the key being the file name and the value a reference to an array containing all the lines that do not begin with '`>`': `my %results; { local @ARGV = ( 'q1.fa', 'q2.fa' ); /^>/ or push @{ $results{ $ARGV } }, $_ while <>; }` [download] To access (ie print) this HoA, you could do, for example: `for ( keys %results ) { print "$_:\n"; print @{ $results{ $_ } }; print "\n"; }` [download]	[reply] [d/l] [select]
Re^2: use of diamond operator for multiple files by aeqr (Novice) on Apr 23, 2014 at 21:07 UTC
Thanks for the help, I think I see how to come up with a new solution.	[reply]
Re: use of diamond operator for multiple files by Laurent_R (Canon) on Apr 23, 2014 at 18:48 UTC
If your code is correct, you could reduce it by trying this (untested): `sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(my $line = <>){ if($line =~ m/^(?!\>).\n$/){push (@{$_[0]},$line)}; } }` [download] or even slightly more concise: `sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(<>) { push @{$_[0]}, $_ if m/^(?!\>).\n$/; } }` [download] Edit: Ooops, sorry for duplicate posting, I don't know what I did wrong. Edit 2: I did not pay attention to the fact that you seem to want to split your data in separate arrays, I thought that you wanted all your data in the same array. I only noticed it after reading Not_a_Number's post below. Well, then, my solution above is not adequate. Sorry.	[reply] [d/l] [select]
Re: use of diamond operator for multiple files by hdb (Monsignor) on Apr 24, 2014 at 12:16 UTC
Two comments from my side. First, your subroutine does the same thing twice. I would change it such that it reads the filtered contents of a given file into a given array and call it twice. Second, your while loop is essential equivalent to grep, so in summary, I would rewrite your code as: `my @seq1 = load_seqs( "q1.fa" ); my @seq2 = load_seqs( "q2.fa" ); sub load_seqs{ open my $fh, "<", shift or die "cannot open sequence file: $!"; return grep { /^(?!\>).*\n$/ } <$fh>; }` [download]	[reply] [d/l]
Re: use of diamond operator for multiple files by Laurent_R (Canon) on Apr 23, 2014 at 18:42 UTC
If your code is correct, you could reduce it by trying this (untested): `sub load_seqs{ local $ARGV = qw /q1.fa q2.fa/; while(my $line = <>){ if($line =~ m/^(?!\>).*\n$/){push (@{$_[0]},$line)}; } }` [download]	[reply] [d/l]