Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone please suggest improvements to the following bit of code:
#! perl -w scipt use strict; use warnings; # open a file for reading in the current directory open(FILEHANDLE,"< input_file.txt"); open(OUTFILE,"+> outfile_file.txt"); my $record_count = 0; while (<FILEHANDLE>){ if ($_ =~ /^(\d{1,5})\s+ (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} (?:([A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7})){0,1} $/gx){ $record_count++; my $1st = $1; my $2st = $2; my $3st = $3; my $4st = $4; my $5st = $5; my $6st = $6; my $7st = $7; my $8st = $8; my $9st = $9; my $10st = $10; my $11st = $11; my $12st = $12; my $13st = $13; my $14st = $14; my $15st = $15; my $16st = $16; my $17st = $17; my $18st = $18; my $19st = $19; my $20st = $20; my $21st = $21; my $22st = $22; my $23st = $23; my $24st = $24; my $25st = $25; my $26st = $26; my $27st = $27; my $28st = $28; my $29st = $29; my $30st = $30; my $31st = $31; my $32st = $32; my $33st = $33; my $34st = $34; my $35st = $35; my $36st = $36; my $37st = $37; my $38st = $38; my $39st = $39; my $40st = $40; my $41st = $41; my $42st = $42; my $43st = $43; my $44st = $44; my $45st = $45; my $46st = $46; my $47st = $47; my $48st = $48; my $49st = $49; my $50st = $50; my $51st = $51; my @array_of_gene_records = ($2st, $3st, $4st, $5st, $6st, $7st, + $8st, $9st, $10st, $11st, $12st, $13st, $14st, $15st, $15, $16st, $1 +7st, $18st, $19st, $20st, $21st, $22st, $23st, $24st, $25st, $26st, $ +27st, $28st, $29st, $30st, $31st , $32st , $33st, $34st, $35st, $36st +, $37st, $38st, $39st, $40st, $41st, $42st, $43st, $44st, $45st, $46s +t, $47st, $48st, $49st, $50st, $51st); my $subrecord_id = 1; foreach (@array_of_gene_records){ print OUTFILE "$record_count\t$1st\t$subrecord_id\t$_\n"; $subrecord_id++; } } }
I get the error messages:
Bareword found where operator expected at sc.pl line 70, near "$1st" (Missing operator before st?) Bareword found where operator expected at sc.pl line 71, near "$2st" (Missing operator before st?) Bareword found where operator expected at sc.pl line 72, near "$3st" (Missing operator before st?) Bareword found where operator expected at sc.pl line 73, near "$4st" (Missing operator before st?) Bareword found where operator expected at sc.pl line 74, near "$5st" (Missing operator before st?) Can't use global $1 in "my" at sc.pl line 70, near "my $1" syntax error at sc.pl line 70, near "$1st " Can't use global $2 in "my" at sc.pl line 71, near "my $2" syntax error at sc.pl line 71, near "$2st " Can't use global $3 in "my" at sc.pl line 72, near "my $3" syntax error at sc.pl line 72, near "$3st " Can't use global $4 in "my" at sc.pl line 73, near "my $4" syntax error at sc.pl line 73, near "$4st " Can't use global $5 in "my" at sc.pl line 74, near "my $5" syntax error at sc.pl line 74, near "$5st " sc.pl has too many errors.

Replies are listed 'Best First'.
Re: Text manipulation
by Roy Johnson (Monsignor) on Jul 08, 2004 at 19:00 UTC
    It looks a little repetitive. I think this is pretty much the same:
    # Somewhere at the top my $chunk = qr/[A-Z0-9()-]{2,15})\s+([A-Z0-9()-]{7}/;\ #... # then instead of your loop body if (/^(\d{1,5})\s+($chunk)/g) { my $1st = $1; my $subrecord_id = 1; ++$record_count; while (/\G($chunk)/g) { print OUTFILE "$record_count\t$1st\t$subrecord_id\t$_\n"; $subrecord_id++; } }
    (unless you might have lines that start with the pattern, but then do something different).

    We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: Text manipulation
by qq (Hermit) on Jul 08, 2004 at 23:05 UTC
    my $1st = $1;

    The errors you are getting are from these variable declarations - user defiend variables should start with a letter or underscore, to avoid conflicts with predefined variables. If you name them "$_1st, $_2nd" youll avoid the error.

    However there is no need to assign the match variables to temporary variables in your case.

    qq

Re: Text manipulation
by ccn (Vicar) on Jul 08, 2004 at 18:20 UTC
    please give us example of your input data.
    and output desired

      Input data:

      94 THI11 YDL244W THI11 YFL058W THI11 YJR156C THI1 +1 YNL332W

      Ouput data:

      102 94 1 THI11 YDL244W 103 94 2 THI11 YFL058W 104 94 3 THI11 YJR156C 105 94 4 THI11 YNL332W
        something like this:
        #!perl use strict; use warnings; # open a file for reading in the current directory open(FILEHANDLE,"< input_file.txt") or die $!; open(OUTFILE,"+> outfile_file.txt") or die $!; my $record_count = 0; while (<FILEHANDLE>){ my @data = split /\s+/; my $first = shift @data; $record_count++; for (my $i = 0; $i < @data; $i+=2) { print OUTFILE "$record_count\t$first\t", $i/2+1, "\t$data[$i] +", $data[$i+1], "\n"; } }
        May be I've missed some details, but the way is correct
        As for me, I prefer -n flag in such cases:
        #!perl -lwn my @data = split /\s+/; my $first = shift @data; for (my $i = 0; $i < @data; $i+=2) { print join "\t", $., $first, $i/2+1, $data[$i], $data[$i+1]; }
        usage:
        this_script.pl input_file.txt >> output_file.txt
        
        Here is the golf version of your program
        #!perl -lawn $f = shift @F; $i = 1; print join "\t", $., $f, $i++, shift @F, shift @F while @F;
        see perldoc perlrun, perldoc perlvar for details