in reply to Regex: Matching around a word(s)

My approach would be to read the file and split it into an array of words, then go through the array and detect matches. For each match you can print the subarray $words[$index-$pre .. $index+$post]. I think this is the sort of thing Ovid had in mind. Using modules is always easier than doing things yourself, but if your definition of words is imple enough, it may be not to onerous.

Of course, if the file is LARGE, you might want to only read sections of the file at a time, and update your 'window' when you have fewer than $post words left in memory.

The next problem is defining what is a word .... Wonder if Shakespeare had anything to say on the topic?

Tom

--
TTTATCGGTCGTTATATAGATGTTTGCA

Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s)
by BrowserUk (Patriarch) on Dec 19, 2005 at 20:17 UTC
    Using modules is always easier than doing things yourself,

    Only if there is a module that does exactly what you want to do. Maybe.

    If there isn't, picking a module with a name vaguely related to the problem description and trying to bend it to the cause is pointless.

    Achieving what you describe--printing the target word and N words either side--is easy using a regex and does have the overhead of creating huge arrays just to stick em all back to gether for output. It is a bread and butter text processing task and exactly what the much lauded, highly prized, Perl-jewel-in-it's-crown regex engine is designed for.

    However, resolving the issues of almagamating multiple, closely consecetive matches into single snippets is equally problematic whether done with a regex or your "subarray" solution. (BTW.It should @array[ n .. m ] for a slice).


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Regex: Matching around a word(s)
by shotgunefx (Parson) on Dec 20, 2005 at 19:41 UTC
    I thought I had responded to this earlier. Actually, I did, but must have mistaken preview for submit (dammit).

    BrowserUk stated my thoughts pretty well. As an aside, simply splitting the string and matching the tokens like so
    my @text = split /\s+/, $text; my @results; for (@text){ push @results, $1 if m/\b($expr)\b/; }
    is pretty slow in comparison to the solution I hit on.


    -Lee

    perl digital dash (in progress)