in reply to Regexp matching on a multiline file: dealing with line breaks
Hello BlueStarry, and welcome to the Monastery!
If the entire file will fit in memory, a variation on kennethk’s solution is to simply delete the newlines before searching:
#! perl use strict; use warnings; my $target = 'kitten'; my $string = do { local $/; <DATA>; }; $string =~ s/\n//g; my $count = () = $string =~ /\Q$target/g; print "The target string '$target' occurs $count times in the file\n"; __DATA__ sushikitten ilovethekit tensushithe kittenisthe
Output:
14:28 >perl 1474_SoPW.pl The target string 'kitten' occurs 3 times in the file 14:28 >
However, as your input file is 5 GB, this approach is probably impractical. In which case you’re going to have to bite the bullet and implement a solution with “strange buffers” — such as a sliding window technique. Maybe have a look at Data::Iterator::SlidingWindow.
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regexp matching on a multiline file: dealing with line breaks
by Anonymous Monk on Dec 06, 2015 at 13:17 UTC | |
by Athanasius (Archbishop) on Dec 06, 2015 at 13:43 UTC |