comment on

Hi,

My file looks something like this (repeated many times for different genes with indicated tabs):

//
SPECIES\tCiona intestinalis
DEV_STAGE\tEarly tailbud
PREDICTED_GENE\tci0100148006
GENE_NAME\tci0100148006\Ms4a4b\Ms4a4c\Ms4a4d\
ORIGINAL_ANNOTATION/COMMENTS\tConversion to Aniseed... 
AUTHORS\tSatou Y, Takatori N,...
REFERENCES\tDevelopment. 2001 ;128(15):2893-904
URL_OF_ORIGINAL_ANNOTATION\thttp://ghost.zool.kyoto-u.ac.jp/indexr1.ht
+ml
ANISEED_ANNOTATION\tSTAINED_REGION:\thead endoderm\tSTAINED_MOL:\tci01
+00148006
ANISEED_ANNOTATION\tSTAINED_REGION:\ttail nerve cord\tSTAINED_MOL:\tci
+0100148006
IN_SITU_URL\thttp://aniseed-ibdm.univ-mrs.fr/insitu.php?id=2605647
//
[download]

and my code is here:

use strict;

my $ish = $ARGV[0];

open (ISH, "<", $ish) || die "$!";
open (OUT, ">", "out.txt") || die "$!";
{
$/ = "//";

while (<ISH>){

    m/DEV_STAGE\t(.*?)\n/g;
    my $a= $1;
    my $b;
    if (m/PREDICTED_GENE\t(.*?)\n/g){
    
    $b= $1;
}
    else{
        $b = '#';
    }
    
    while (m/\tSTAINED_REGION:\t(.*?)\tSTAINED_MOL:\t(.*?)\n/g) {
    
    my $c = $1;
    my $x = $2;
    print OUT "$b\t$a\t$c\t$x\n";

}
}    
}
[download]

Problem is that i only get last 2 columns (STAINED_REGION and STAINED_MOL) and i can't get first 2 columns (PREDICTED_GENE and DEV_STAGE). Could you point me what i'm doing wrong.

Found code that works :)

use strict;

my $ish = $ARGV[0];

open (ISH, "<", $ish) || die "$!";
open (OUT, ">", "out.txt") || die "$!";
{
$/ = "//\n";

while (<ISH>){

  m/DEV_STAGE\t(.*?)\n/g;
  my $a= $1;
  my $b;
  if (m/PREDICTED_GENE\t(.*?)\n/g){
  
  $b= $1;
}
  else{
    $b = '#';
  }
  
  while (m/\tSTAINED_REGION:\t(.*?)\tSTAINED_MOL:\t(.*?)\n/g) {
  
  my $c = $1;
  my $x = $2;
  print OUT "$b\t$a\t$c\t$x\n";

}
}
}
[download]

In reply to Extract from text file by mocnii

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.