You can simplify the code in two ways: (1) let the command line shell do the file open/close for you, and (2) change the technique for tracking your "target name" field.

For the first point, consider using a command line like this to run the script:

probe.pl < probe_input.file > output.file
(I assume your shell can just run the script like this.) This way, your whole script is just the loop to read and process data, plus the for loop to print results -- no prompting for file names, no open or close statements.

As for the second point, your original design reads in one line to set an initial target name then does a rather twisted, (dare I say unnatural) "do { ... } until (eof) " sort of loop to group subsequent lines with the first target, and do something weird when the next line reveals a new target. How about an approach that uses a normal while loop, and doesn't try to get the initial key pattern in any special way, as suggest below.

(BTW, I was a bit confused by your examples of input data; the first had a target name and a probe string on the same line, while the latter had them on separate lines; the code shown below should work no matter which way the input is formatted.)

#! C:\Perl\bin\perl use strict; use warnings; my $target_name = ''; my %probes; while (<>) { if ( /:(\d{6})_/ ) { #this line has a target name if ( $target_name ne $1 ) { $target_name = $1; print STDERR "$target_name\n"; } } if ( /\b([ACGT]{20})/ ) { # this line has a probe string my $probe = $1; print STDERR "$probe\n"; push @{$probes{$target_name}}, $probe; } } # printing to STDOUT with redirection on the command line # will save results in an output file (whereas printing to # STDERR goes to the shell window -- unless you use a bash # or bourne-style shell, which allows you to redirect # STDERR to a separate file, using "2> errlog.file") for (sort{ @{$probes{$b}} <=> @{$probes{$a}} } keys %probes) { print "$_, ", join(", ", sort @{$probes{$_}}), "\n"; }
Having worked this out based on your post, I saw quite a few typos: missing sigils (in the sort function of the for loop), fictional operators ("!=~"), incorrect use of valid operators ( $_ = m[:(\d{6})_]x; sets $_ to "1" (true) if the match succeeds, "0" otherwise; the captured text "$1" does not get assigned to anything -- and why the "x" modifier at the end, if you don't use whitespace for legibility in the regex? -- BTW, to assign $1 to $_, it should be:  ( $_ ) = m/:(\d{6})_/; And there was more trouble besides that.

So, if you had code that ran at all before you posted, you seem to have posted something very different from what you were running. Anyway, you need to study up a bit more on basic syntax, operators, etc. Hang in there.


In reply to Re: Why is it matching?? by graff
in thread Why is it matching?? by bioinformatics

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.