Re: Regexp matching on a multiline file: dealing with line breaks

Replies are listed 'Best First'.

Re^2: Regexp matching on a multiline file: dealing with line breaks
by BlueStarry (Novice) on Dec 06, 2015 at 09:55 UTC

Many thanks to everyone.

I'll go with sliding windows but first probably i've got an idea myself, but i don't know if it's correct. My original file is divided in many "paragraphs" every one of them starting with a special line like this

>Header
[download]

[reply]
[d/l]

Re^3: Regexp matching on a multiline file: dealing with line breaks

by Laurent_R (Canon) on Dec 06, 2015 at 10:01 UTC

Yes, by all means, if you can identify sections or chunks where you can be sure that there cannot be an overlapping match on the chunk boundary, then you don't even need a sliding window: just load and process one chunk after another just the same way you've been told before for the whole file, it is even simpler than a sliding window.

[reply]

Re^3: Regexp matching on a multiline file: dealing with line breaks

by Athanasius (Cardinal) on Dec 06, 2015 at 13:12 UTC

As Laurent_R says, this is an excellent strategy. Have a look at the entry for $INPUT_RECORD_SEPARATOR (usually spelled just $/) in perlvar. For example:

#! perl
use strict;
use warnings;

my $target = 'kitten';
my $count  =  0;

$/ = ">Header\n";

{
    local $/ = ">Header\n";

    while (my $string = <DATA>)
    {
        $string =~ s/\n//g;
        print "string is '$string'\n";
        $count += () = $string =~ /\Q$target/g;
    }
}

print "The target string '$target' occurs $count times in the file\n";

__DATA__
>Header
sushikitten
ilovethekit
tensushithe
kittenisthe
>Header
sushikittAn
ilovethekit
tensushithe
kittBnisthe
[download]

Output:

23:11 >perl 1474_SoPW.pl
string is '>Header'
string is 'sushikittenilovethekittensushithekittenisthe>Header'
string is 'sushikittAnilovethekittensushithekittBnisthe'
The target string 'kitten' occurs 4 times in the file

23:11 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^4: Regexp matching on a multiline file: dealing with line breaks

by BlueStarry (Novice) on Dec 06, 2015 at 14:02 UTC

Thank you very very much i REALLY appreciate your help and dedication on my matter.

[reply]

Re^4: Regexp matching on a multiline file: dealing with line breaks

by BlueStarry (Novice) on Dec 10, 2015 at 17:19 UTC

i've tried to put the regular expression inside $\ but it doesn't seem to work

[reply]

Re^5: Regexp matching on a multiline file: dealing with line breaks

by choroba (Cardinal) on Dec 10, 2015 at 17:26 UTC

Re^6: Regexp matching on a multiline file: dealing with line breaks

by Anonymous Monk on Dec 10, 2015 at 21:55 UTC

Some notes below your chosen depth have not been shown here