I want to find patterns dispersed within texts. Any word as the search pattern. Any text.

So (here goes):

I split a word into character pairs. Say the name is 'helen' (case irrelevant). That's got 5 letters; so it is two pairs and a single letter: 'he', 'le' and 'n'.

I want to get the parts in sequence. This is the case whether they are conveniently in the correct order in the text, such as:

1. xxxhexxxxxxxx xle xxxxxx nxxx

From this I want: xxxhexxxxxxxx xle nxxx

But they may not be in quite the right order. There may be repetitions and/or parts in the wrong order:

2. xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx

I'd like to get:

xxxhexxxxxxxx xle xxnxle xnxxhexx xlexxxxxx nxnx

'xxnxle ' appears because it contains the final 'n'. The fact that it contains an additional 'le' just does not matter.

But actually I get:

xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx


In other words it should always get the input sequence in the correct order if it is there. It will get it repeatedly if it is there. It will discard if it can anything extraneous.

Taking an input ( for instance, $words = 'xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx')


my @other_stuff = split (' ', $words); my @pairs = $string =~ /..?/sg; my @stuff = grep /\B$pairs[0]|\B$pairs[1]|\B$pairs[2]/, @other_stuff; # this assumes I know the word length. # but actually I would not know. # so this is not ok and needs to be fixed! for (@stuff) { print $_ . " "; print OUTPUT $_ . " "; } close OUTPUT; exit;
Prints: 'xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx' as above, which is wrong.

I realise this is a complicated question. But any help gratefully received.


In reply to pattern sequence dispersed within text by nicemank

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.