Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: How to match more than 32766 times in regex?

by FreeBeerReekingMonk (Deacon)
on Dec 01, 2015 at 21:25 UTC ( [id://1149079]=note: print w/replies, xml ) Need Help??


in reply to Re^3: How to match more than 32766 times in regex?
in thread How to match more than 32766 times in regex?

No, it is worse, it seems its a DNA sequence, and wants Subsequence for Longest_common_subsequence_problem. To cut the problem down, you could use the Needleman–Wunsch_algorithm. Sinds python has some bioinformatic libs, they can be done there. It can be done in Perl, of course, but then the algorithm needs to be coded.

  • Comment on Re^4: How to match more than 32766 times in regex?

Replies are listed 'Best First'.
Re^5: How to match more than 32766 times in regex?
by karlgoethebier (Abbot) on Dec 02, 2015 at 09:55 UTC
    "...the algorithm needs to be coded"

    It seems like someone did it already: Algorithm::NeedlemanWunsch.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      wow... totally missed that one. Thanks for pointing that one out. $perl_coolness_factor++;

Re^5: How to match more than 32766 times in regex?
by rsFalse (Chaplain) on Dec 01, 2015 at 22:26 UTC
    Ah yes, in this node Re^2: Complex regular subexpression recursion limit I didn't get an answer :/ .
    Today I was solving another problem (and encountered same limitation). Full problem was: given a string (up to 1e5 length) consisting of '0' and '1', answer what is the length of the longest alternating subsequence if you are able to choose and invert one substring. For example, given a string '100111', I can invert substring from 3rd to 4th character ( substr $line, 2, 2, (substr $line, 2, 2) =~ y/01/10/r ), and then string become '101011' and has alternating subsequence (indexes: 0,1,2,3,4 or 0,1,2,3,5).
    I wanted to solve that problem with regexes (I knew that I can solve it other way), so I tried to count /1+/ and /0+/ (this is the answer of longest alternating subsequence if no inversions are made). I thought that I can do:
    $line =~ y/1/,/; $len = split /\b/, $line;
    , but I decided to stay with zeroes and ones, and wrote  () = $line =~ /(.)\1*/g (as I shown). Later I add to $len:  /(.)\1\1|(.)\2.*(.)\3/ + /(.)\1/, because each regex if succedes it gives +1 to the possible length of subsequence after one inversion.
    I often try to solve problems from competitive programming online sites or sites like projecteuler.net and I practise do it with Perl.
    After I used to calc all the sum:
    $len = + (() = /(.)\1*\1*\1*\1*/g) + /(.)\1\1|(.)\2.*.*.*.*(.)\3/ + /(.)\1/
    - it consumed too much time when solving input line '01' x 5e4;

    upd: was bad example with reversion, now fixed to inversion.
      $len = + (() = /(.)\1*\1*\1*\1*/g) + /(.)\1\1|(.)\2.*.*.*.*(.)\3/ + /(.)\1/
      You know, that doesn't make any sense whatsoever.
        After some pondering... Is this what you tried to do:
        use strict; use warnings; my @strs = ( '010111', '0' x 1_000_000, '01' x 1_000_000, '011' x 1_000_000, ( '01' x 1_000_000 ) . '111', ); for my $str (@strs) { my $len = ( () = $str =~ m{ 0+ | 1+ }xg ) + ( $str =~ m{ 000 | 111 | (.)\1 .* (.)\2 }x ? 2 : 0 ); print $len, "\n"; }
        (Perls regex optimizer is pretty smart about the second regex, btw!)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149079]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found