comment on

Instead of creating separate arrays like this

my @col1; ## column 1
my @col_ID; ## column 2
my @col3; ## column 3
[download]

you could use a single array of hashes. ( See perldsc )

 @AoH = (
{
  col1   => "col1",
  col_ID => "col_ID",
  col3   => "col3",
},)
[download]

For example, something like this ;

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dump 'pp';

my $inputfile1 = $ARGV[0];
my $outputfile = 'fasta';

#open IN, '<', $inputfile1 
#   or die "Uh oh.. unable to find file $inputfile1 : $!";

open OUT, '>>',$outputfile 
   or die "Could not open $outputfile : $!";
   
my @match;
while ( my $line = <DATA> ) { ;# use IN
  chomp($line);
  if( $line =~ m/splic/) { 
    my @colsplit = split /,/, $line; # use \t
    my $record = {
      'col3'   => $colsplit[2],
      'col1'   => $colsplit[0],
      'col_ID' => $colsplit[1],
      'col_strand_direction' => $colsplit[5],
    };
    
    ##pulls out + or - and subsequent number and [base change]
    if ( $record->{'col3'} =~ m/([+-]\d+)\w+(\[[ACTG]])/) { 
      $record->{'intron_from_boundary'} = $1; 
      $record->{'baseref'} = $2 ;
      $record->{'offset'}  = 13;
      if ($record->{'col_strand_direction'} =~ /\+/){
        $record->{'offset'} += $record->{'intron_from_boundary'} ;
      } else {
        $record->{'offset'} -= $record->{'intron_from_boundary'} ;
      }
    }
    
    push @match,$record;
  }
}

# show data structure
pp @match;

# need to take each intronmatch value 
# and work out its position relative 
# to intron/exon boundary
foreach my $rec (@match) {
  my $offset = $rec->{'offset'};
  my $string = substr($rec->{'col1'},$offset,20);
  print "offset = $offset : $string\n";
  print OUT '>' . $rec->{'col_ID'} . $string . "\n";    
}
close OUT;

__DATA__
1col1abcdefghijklmnopqrstuvwxyz0123456789,1col_ID,+1col3[A],1col4,spli
+c,+
2col1abcdefghijklmnopqrstuvwxyz0123456789,2col_ID,-2col3[C],2col4,spli
+c,-
3col1abcdefghijklmnopqrstuvwxyz0123456789,3col_ID,+3col3[T],3col4,spli
+c,+
4col1abcdefghijklmnopqrstuvwxyz0123456789,4col_ID,-4col3[G],4col4,spli
+c,-
[download]

poj

In reply to Re: It's all getting messy - remove whitespace by poj
in thread It's all getting messy - remove whitespace by lecb

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.