Re^2: Filter and writing error log file

Dear All,

Thank you very much for your time and suggestions.

I agree Laurent R that these DNA files could be very long and loading them in an array at the beginning could pose a memory problems. This is the reason why I am trying to read the file in a while loop and checking for conditions. I tried to use if condition in the program

if (($seq =~/[A|T|G|C]/) && ($lenseq == 19))
    {
        print "$seq\n";
    }
    else {print "error log file";} # here I want to print those fragem
+nts whose length is either less than or greater than 19 and if the fr
+agments contains based other than [ATGC]
[download]

All this in a while loop so that I can read huge files without worrying about the memory issues.

Could it be possible to get some directions as to how to check those condition and only if the conditions are true the sequences are processed further.

Thanks to all of you

Comment on Re^2: Filter and writing error log file Download Code

Replies are listed 'Best First'.
Re^3: Filter and writing error log file by choroba (Cardinal) on Jul 23, 2014 at 13:29 UTC
To check that a string contains something other than A, C, T, or G, search for the offending character, so in your condition, use `$seq !~ /[^ACTG]/` [download] Note that \| is not needed in a character class (in fact, it matches literally, so avoid it if you don't want to match it). لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: Filter and writing error log file by newtoperlprog (Sexton) on Jul 23, 2014 at 13:44 UTC
Thanks for the suggestions. One question, why we have to use '^' to match rather than `[ATGC]`	[reply] [d/l]
Re^5: Filter and writing error log file by choroba (Cardinal) on Jul 23, 2014 at 13:52 UTC
See perlre. The carret negates the class, so the regular expression matches non-ACTG characters, but I used !~ to negate that. It's like the difference between "The sequence doesn't contain invalid characters" and "The sequence contains valid characters" These two are not equivalent, as the second lacks the work "only". لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^6: Filter and writing error log file by newtoperlprog (Sexton) on Jul 23, 2014 at 19:55 UTC
Re^7: Filter and writing error log file by choroba (Cardinal) on Jul 24, 2014 at 13:16 UTC
Some notes below your chosen depth have not been shown here
Re^7: Filter and writing error log file by newtoperlprog (Sexton) on Jul 24, 2014 at 12:46 UTC
Re^4: Filter and writing error log file by newtoperlprog (Sexton) on Jul 23, 2014 at 15:20 UTC
I have one loop related question. I have defined an array of alphabet from ("A" .. "Z") but after reading a long file the alphabets end and the program shows error of uninitialized values. My questions is how can I define an array of alphabets which can go to AA, BB, CC and ...so on when the "A" .. "Z" ends.	[reply]
Re^5: Filter and writing error log file by choroba (Cardinal) on Jul 23, 2014 at 15:26 UTC
Alphabet stands for the whole series of letters. Use the word "letter" for a single character like "A" or "Z", please, not alphabet. Don't create an array. Just start with `my $letter = 'A';` [download] In every iteration, do `$letter++;` [download] and let the magic do all the work. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^5: Filter and writing error log file by newtoperlprog (Sexton) on Jul 23, 2014 at 17:14 UTC
Thanks for explaining the '^' behaviour and loop through the letters. I have some more doubts and questions regarding one of my another programs which I have written but its very crude and i want some help in making in more robust. Is it ok to post it here and get some help. Thanks	[reply]
Re^3: Filter and writing error log file by Laurent_R (Canon) on Jul 23, 2014 at 18:00 UTC
Hi, you could have a number of `next` statements to discard records that are not good. For example: `while (<$IN_FILE>) { chomp; next if /[^ACTG]/; # removes lines with other letters next if length != 19; # removes lines not 19 char long # I just made up the next rule for the example next if /(.)\1\1/; # removes lines where the same letter comes th +ree times in a row # etc. # now start doing the real processing # ... }` [download] The `next` statement goes directly to the next iteration of the `while` loop, so that faulty lines are effectively discarded early in the process.	[reply] [d/l] [select]
Re^4: Filter and writing error log file by newtoperlprog (Sexton) on Jul 23, 2014 at 20:31 UTC
Thanks Laurent_R for the suggestion :-)	[reply]