Re: Why is it matching??
by BrowserUk (Patriarch) on Sep 11, 2003 at 21:41 UTC
|
Take a close look at the logic of your code as it stands
- Open the file
- Read a line in $_
- Extract the $target_name from the line
- Enter the do loop
- If the contains $target_name, enter the if block
- Read the next line into $probe
- If that line start with 20 word chars, push it onto an array in a has keyed by $target_name
- Exit both nested if blocks
- Test if the current line doesn't contain the target name
(I assume this is what you meant, but
if ($_ !=~ [$target_name]) {
doesn't do this! And if that use warnings; was really a part of your program, you would know this as you would have recieved error messages something like
my $target_name = 'the quick brown fox';
$_ = 'the quick brown fox jumps over the lazy dog';
if ($_ !=~ [$target_name]) { print 'ok' }
Argument "�›¡¡�›ª╫╧çââ€
+¢Â¬Ã¢â€¢Â©ÃƒÅ“₧╠╞╧╓" isn't numeric in
+numeric ne (!=) at ...
Argument "the quick brown fox jumps over the lazy dog" isn't numeric i
+n numeric ne (!=) at ...
Disabling warnings and/ or strict doesn't fix your problems, it just stops perl telling you about them.
Putting them back before you post code, just wastes everyone time.
- Ignoring the above, lets assume that it doesn't enter your second if block. So..
- It reaches the end of the until loop.
As you've only read 2 lines, we probably haven't reached the end-of-file yet, so we go back to the top and hit the first if condition again.
- As we read a second line, it probably doesn't contain $target_name, so we skip to the second if condition. But this is broken and fails so we don't enter that either.
- We're back to the until condition again, but as we haven't read another line, it is still false so we loop again.... ad naseum.
If you compare your version of this with the code it is based on at Re: Little pattern problem..., you'll see that you have replaced the inner while loop with an if statement. This means that instead of looping, reading new lines and pushing them onto the array until if finds a line that does match the condition, it reads one line, pushes it and then does nothing else.
Suggestion: Go back to the original code, read through it and try and understand how it works before you try to modify it. Then, when you start making changes, leave warnings and strict enabled!. If you make a change and it gives you a warning, try and understand what the warning means and correct it. Add use Diagnostics; may help you to interpret the messages. Correct any such warnings before you move on to making the next change. In the long run, you'll learn faster and achieve your goal much quicker that way.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] [select] |
|
|
LOL....Thank you for the harsh rebuke:-). As for the warnings issue, I failed to mention that, but hadn't figured out what it was refering to until checking things out in Programming Perl. So, yes I do use warnings thank you very much. No offense, but I couldn't get your program to work as stated. It gave me the same problem I am asking about in this node. $target_name doesn't change, so only one key is present in the hash. I need the keys to continue changing with $target_name. Since that does not occur in either program, the question still stands....
Bioinformatics
Bioinformatics
| [reply] |
|
|
The rebuke wasn't intended to come across as harsh. Sorry that it did.
Moving right along. Could you explain a little more of what you mean by ...but I couldn't get your program to work as stated...? I just downloaded the code again and it produced the output I listed, which show that two keys were created. The first with three probes found
1415671 :
GGAACAGGAATGTCGCAACATCGTA,
ACATCGTATGGATTGCTGAGTGCAT,
GGCTGATCACATCCAAAAAGTCATG
And the second with 10:
1415670 :
GAGGAAACGTTCACCCTGTCTACTA,
GTTCACCCTGTCTACTATCAAGACA,
TACTATCAAGACACTCGAAGAGGCT,
CTGTGGGCAATATTGTGAAGTTCCT,
GAATGCATCCTTGTGAGAGGTCAGA,
GAGAGGTCAGACAAAGTGCCAGAAA,
AAAACAAGAACACCCACACGCTGCT,
ACACGCTGCTGCTAGCTGGAGTATT,
TATCTTGTCCAACACTACGTCGAAG,
TTGTCACCATGCCTGCAAGGAGAGA
This is as expected from the sample data you provided on that original post, although I've manually wrapped it to prevent it getting confused by the autowrapper.
If you are getting different output when you run my original code, then could you post the output you get please and I'll try to work out what could be different.
The way $target name gets updated in the original is like this.
do {
# extract the target name
$target_name = $1 if m[( \d{7} ) _at: \d{3} : \d{3} ]x;
while( m[$target_name] ) {
# process the record containing the current target name
my $probe = <DATA>; # Read the probe
chomp $probe;
# save it in an HoA keyed by the target name
push @{ $probes{ $target_name } }, $probe;
# get the next line;
last unless defined( $_ = <DATA> );
}
} until eof DATA; # till done
$target_name is set at the top of the outer do..until loop.
The code enters the inner while loop, reads the next line, extracts the probe pushes it onto the HoA.
It then gets to the last unless defined... line, where it reads another line. So long as it hasn't reached the eof, then it loops back to the top and tests the while condition again. If it matches, the loop repeats, another probe is read and pushed.
If it doesn't, then it falls out of the while loop and the until eof DATA condition is tested. If it's not at the eof, then it loops back to the top of the do...until loop and the new $target_name is extracted from the last line it read (which failed to match the while condition) and the cycle repeats.
Hopefully, that explains how it works and will allow you to modify it to your needs.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
Re: Why is it matching??
by chromatic (Archbishop) on Sep 11, 2003 at 20:25 UTC
|
You probably want =~ here, though it appears to be a no-op:
$_= m[:(\d{6})_]x;
| [reply] [d/l] [select] |
|
|
That is true, thanks. However, it isn't getting far enough into that block for that to affect anything yet. I'm wondering if it never gets past the first if statement, such that it believes that everything matches the initial value of $target_names. Any print statement placed into the second "if" block returns nada, so something is corrupted...
Bioinformatics
| [reply] |
Re: Why is it matching??
by tcf22 (Priest) on Sep 11, 2003 at 20:27 UTC
|
I'm a little confused on what you are doing, but perhaps changing
if ($_ !=~ [$target_name])
to
if ($_ !~ /$target_name/)
or
unless (/$target_name/)
will fix your problem.
- Tom | [reply] [d/l] [select] |
|
|
Using unless is the same difference (I tried it just for kicks, and it doesn't help). !~ on the other hand throws my program off completely. The program outputs nothing. Thanks for the input!!
Bioinformatics
| [reply] |
|
|
Hmmmmm...maybe you are right. !=~ could be an issue, depending on what has most priority, =~ or !=. If it is the latter, then what the regex effectively says is, if it is not equal to $target_name in reverse...~ by itself (addressing a non-numeric string)returns a string with identical length but with all the bits of the string complemented. I don't know if that effect is changed by the presence of // though... Could I be facing a double negation? If it is the former, then I would assume it would be the rough equivalent of !~.
Bioinformatics
| [reply] |
Re: Why is it matching??
by graff (Chancellor) on Sep 12, 2003 at 05:11 UTC
|
You can simplify the code in two ways: (1) let the command line shell do the file open/close for you, and (2) change the technique for tracking your "target name" field.
For the first point, consider using a command line like this to run the script:
probe.pl < probe_input.file > output.file
(I assume your shell can just run the script like this.) This way, your whole script is just the loop to read and process data, plus the for loop to print results -- no prompting for file names, no open or close statements.
As for the second point, your original design reads in one line to set an initial target name then does a rather twisted, (dare I say unnatural) "do { ... } until (eof) " sort of loop to group subsequent lines with the first target, and do something weird when the next line reveals a new target. How about an approach that uses a normal while loop, and doesn't try to get the initial key pattern in any special way, as suggest below.
(BTW, I was a bit confused by your examples of input data; the first had a target name and a probe string on the same line, while the latter had them on separate lines; the code shown below should work no matter which way the input is formatted.)
#! C:\Perl\bin\perl
use strict;
use warnings;
my $target_name = '';
my %probes;
while (<>) {
if ( /:(\d{6})_/ ) { #this line has a target name
if ( $target_name ne $1 ) {
$target_name = $1;
print STDERR "$target_name\n";
}
}
if ( /\b([ACGT]{20})/ ) { # this line has a probe string
my $probe = $1;
print STDERR "$probe\n";
push @{$probes{$target_name}}, $probe;
}
}
# printing to STDOUT with redirection on the command line
# will save results in an output file (whereas printing to
# STDERR goes to the shell window -- unless you use a bash
# or bourne-style shell, which allows you to redirect
# STDERR to a separate file, using "2> errlog.file")
for (sort{ @{$probes{$b}} <=> @{$probes{$a}} } keys %probes) {
print "$_, ", join(", ", sort @{$probes{$_}}), "\n";
}
Having worked this out based on your post, I saw quite a few typos: missing sigils (in the sort function of the for loop), fictional operators ("!=~"), incorrect use of valid operators ( $_ = m[:(\d{6})_]x; sets $_ to "1" (true) if the match succeeds, "0" otherwise; the captured text "$1" does not get assigned to anything -- and why the "x" modifier at the end, if you don't use whitespace for legibility in the regex? -- BTW, to assign $1 to $_, it should be: ( $_ ) = m/:(\d{6})_/; And there was more trouble besides that.
So, if you had code that ran at all before you posted, you seem to have posted something very different from what you were running. Anyway, you need to study up a bit more on basic syntax, operators, etc. Hang in there. | [reply] [d/l] [select] |
Re: Why is it matching??
by InfiniteSilence (Curate) on Sep 11, 2003 at 21:43 UTC
|
I can't even get this to run. I get the following run-time error:
Argument "╛¡¡╛ª╫╧ç╬₧M-^]╩	
+556;Ö╟╓" isn't numeric in numeric ne (!=) at checkme.pl l
+i
ne 29, <INPUT> line 2.
So I rewrote it a bit:
#!/usr/bin/perl -w
use strict;
my $ex_line = q(perl -ne "if (/(\d{6})_/g){ chomp;print qq($1,),(split
+/;/,$_)[-1],qq(\n)};" input.dat);
my $stuff = `$ex_line`;
my %hash;
for (split /\n/, $stuff) {
my @arr = split/,/,$_;
push @{$hash{$arr[0]}} , $arr[1];
}
foreach (keys %hash) {
print "$_\t" . join / /, @{$hash{$_}} , qq(\n);
}
1;
With the following data:
>probe:ATH1-121501:244901_at:594:703;Interrogation_Position=56;Antisen
+se;TTGCTGCTATTCTATCTATTTGTGC
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTTCCGTTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTTCCGTTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTrCCGTTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTfTCCGTTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTnCCGTTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTTCCGuTTCTTCGGATGAT
Times 11
>probe:ATH1121501:244902_at:522:511;Interrogation_Position=153;Ant
+isense;GGTATTTTiCGTTTCTTCGGATGAT
Times 11
And I get:
244901 TTGCTGCTATTCTATCTATTTGTGC
244902 GGTATTTTCCGTTTCTTCGGATGATGGTATTTTCCGTTTCTTCGGATGATGGTATTTrCCGT
+TTCTTCGGAT
GATGGTATTfTCCGTTTCTTCGGATGATGGTATTTnCCGTTTCTTCGGATGATGGTATTTTCCGuTTCTT
+CGGATGATGG
TATTTTiCGTTTCTTCGGATGAT
I also get a nasty little warning about my ``. Something is not escaped :( You should probably rewrite that inside of the body of the code anyway.
Celebrate Intellectual Diversity | [reply] [d/l] [select] |
Re: Why is it matching??
by Theo (Priest) on Sep 12, 2003 at 00:50 UTC
|
This isn't very elegant, but you could scatter print statements through the suspect areas to see what is being executed and in what order and what your variable values are.
$tesing = 1;
.
.
print "At second 'if' in do-loop, \$target_name is $target_name" if ($
+testing);
.
.
(untested)
Or something similar.
Theo
| [reply] [d/l] |