ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, My code parses one file and prints the output with some added information.
My sample File input is :-
>JAVA3_70_303NM:2:1:184:1240 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 34 40 40 40 40 40 40 4 +0 37 40 40 40 40 40 40 24 30 40 40 17 >PERL3_70_303NM:2:1:234:1166 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 40 40 40 40 17 40 40 40 40 >PYTHON3_70_303NM:2:1:202:1171 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 14 40 40 13 18 40 35 3 +8 34 40 40 40 4 37 28 40 40 40 40 2
My output is coming like this with a faulty added characters in the begining of the parsed file like this:- My faulty Output:-
JAVA3_70_303NM:2:1:184:1240 length=44 3 70 303 2 1 184 1240 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 + 34 40 40 40 40 40 40 40 37 40 40 40 40 40 40 24 30 40 40 17 PERL3_70_303NM:2:1:234:1166 length=44 3 70 303 2 1 234 1166 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 + 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 17 40 40 40 40 PYTHON3_70_303NM:2:1:202:1171 length=44 3 70 303 2 1 202 1171 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 + 14 40 40 13 18 40 35 38 34 40 40 40 4 37 28 40 40 40 40 21
These extra (3 70 303) characters are coming in every line begining. My code is here:-
#!/usr/bin/perl -w use strict; $fn=$ARGV[0]; open(FD,"$fn") || die("Can't open: $!"); $/ = '>'; while ( <FD> ) { chomp; if($_=~ /(\S+)/xmsg){ my $name = $1; my @numbers = split /\D+/; my $values = @numbers; print "$name\tlength=$values\n"; print "@numbers\n"; } } close FD;
Here length specifies number of times 10's or 20's or ..... occur in each record.
Can you please help me to identify the problem.
Thanks

Replies are listed 'Best First'.
Re: Extra characters problem while parsing a file
by graff (Chancellor) on Nov 19, 2008 at 07:07 UTC
    Your first input from  while (<FD>) is just  > (the first character of input happens to be your input record delimiter).

    The next input starts with  JAVA3_70_303NM:2:1:184:1240, and after you capture this into your $name variable, it is still at the start of the input string (you have not gotten rid of it), so when you do  split /\D+/, those initial digits strings (3, 70, 303, 2, 1, 184, 1240) end up in @numbers.

    When you capture the $name value, you want to do it like this:

    if ( s/^(\S+)\s+// ) { my $name = $1; ...
    That is, remove the "name" string when you capture it.

    UPDATE: Added "\s+" to the pattern -- white space should be removed too, in anticipation of the later split /\D+/ so that you don't get an empty string as the first element in @numbers.

Re: Extra characters problem while parsing a file
by luckypower (Beadle) on Nov 19, 2008 at 06:50 UTC
    I think here is the o/p which you want(hope so...)
    JAVA3_70_303NM:2:1:184:1240 length=37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 34 40 40 40 40 40 40 +40 37 40 40 40 40 40 40 24 30 40 40 17 PERL3_70_303NM:2:1:234:1166 length=37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 +40 40 40 40 40 40 40 40 17 40 40 40 40 PYTHON3_70_303NM:2:1:202:1171 length=37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 14 40 40 13 18 40 35 +38 34 40 40 40 4 37 28 40 40 40 40 2


    here is some changes in your code ....
    if($_=~ /(\S+)(.+)/xmsg){ # $2 will contain all the digits. my $name = $1; my @numbers = split /\s+/, $2; my $values = @numbers;


    if i am wrong plz give the proper o/p which you want....

    have a good day :)