Hello,

I am a beginner trying to use a regular expression to search for a pattern. The problem is that the pattern is split across multiple lines. For example, if I am searching for 'abc' in a file containing the following:


abcdefab
cdefa
bcdef

I should get 3 matches.

The first thing I tried was to write a While script that reads the file line by line, chomps the line and stores everything in a single variable which I then search. This works well when I use a small sample file but for some reason doesnt work with my actual txt file which is several hundred mb so maybe it is not the most efficient way of doing this.

This is my code  while ($line=<inputfile>){chomp $line; $string=$string.$line;}

A second option according to google is to use /m or /s modifiers. But the problem is that I dont know where the text would be broken by the newline so should I put a . after every character in my regular expression? My actual expression is pretty long so I dont know if that would be the best way to do it.

My regular expression is to search for a keyword and capture the 5 characters before and after it with the keyword also containing a random 10 character sequence in it. $string =~ /(?<=(.....))abc(.{10})def(?=(.....))/g

I am assuming that there is something obvious I am missing here so would appreciate any help.

Thanks

In reply to Regular expressions across multiple lines by abcd

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.