Benson has asked for the wisdom of the Perl Monks concerning the following question:

Friends,
I have encountered one problem for the below input because the last character G of first line and the first character C of the next line are @allowed region. But when I get the output, G and C are not in @allowed. This is because in the input there is a space after the last character G and the character C is in the next line.

my @allowed = qw[ AA AG GC GT CA CG TT TC ]; my $allowed = join "|", @allowed; my $regex = qr/ N+ | (?: (?=$allowed) . )* . /x; $data="TGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGGGGGGGGGGGATAG C"; print "$_\n" for $data =~ m/$regex/g;

ie instead to be printed as GC, I get it as

G C

which is wrong

Please give a solution.

2005-10-21 Retitled by planetscape, as per Monastery guidelines
Original title: 'Reading Substring'

Replies are listed 'Best First'.
Re: How to remove Newline
by Skeeve (Parson) on Oct 20, 2005 at 07:37 UTC
    Why don't you simply remove all newlines from your string before? IIRC there is also a rule in the program others wrote for you, that inserts newlines where appropriate.

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: How to remove Newline
by blazar (Canon) on Oct 20, 2005 at 09:03 UTC
    First off, why did you start a new thread after having asked the very same question (I suppose, because the latter is screwed up) in another thread?

    Said this,

    • I see no need for the intermediate @allowed variable: it only clobbers your logic - but this is only cosmetic;
    • Coming to the regex:
      my $regex = qr/ N+ | (?: (?=$allowed) . )* . /x;
      what is that "N"?!? Didn't you say that your string only contains the letters ACGT?
    • I can't understand: it seems that (leaving N+ aside) you want to
      "match zero or more occurrencies of a group composed by an allowed sequence and any charachter followed by (another) any charachter (without including the allowed sequences in the match)."
      Is this what that you really want?
    Whatever, the easiest solution I can think of to solve your problem would be of removing the newlines before processing your data. If that data is huge and e.g. read from a file, you may want to do so (joining lines or bunch of lines) in a smart way, which you will have to devise.
Re: How to remove Newline
by GrandFather (Saint) on Oct 20, 2005 at 08:56 UTC

    Remove the new lines from your input. For the sample code you can do it using $data =~ s/\n//; before the print line.


    Perl is Huffman encoded by design.