in reply to Re^2: Entity statistics
in thread Entity statistics
my @regexes = (§\s*[0-9], Art\.\s*[0-9IVX, ...)
Like that, except that each regex needs to be contained in some way otherwise it will look like perl code. You can either enclose them in quotes or mark them as regex by using the qr// operator like this:
my @regexes = (qr/§\s*[0-9]/, qr/Art\.\s*[0-9IVX]/, ...)
Then how do I read "a data file into a scalar as a string"?
Mostly as how you have said you do it normally but being sure to concatenate each line or to read them all at once. There are modules which can help with this such as Path::Tiny, File::Slurper and so on. See lots more about this in the Illumination How do I read an entire file into a string?
my $infile = $ARGV[0]; open my $inh, '<', $infile or die "Cannot open $infile for reading: $! +"; my $xml; { local $/ = undef; $xml = <$inh>; } close $inh;
Which kind of loop construct do you think of?
I was thinking of a for loop, as that is the trivial way to iterate over an array unless there is a good reason to use something else (which does not appear to be the case here).
Thanks for clarifying about the entities. Those should be fine as they are just data. You may need to escape any characters which have special meaning to the regular expression engine but otherwise they should not cause any problems. Try it and see how you get along.
🦛
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Entity statistics
by LexPl (Beadle) on Nov 12, 2024 at 13:15 UTC | |
by choroba (Cardinal) on Nov 12, 2024 at 13:23 UTC | |
by LexPl (Beadle) on Nov 12, 2024 at 16:50 UTC | |
by choroba (Cardinal) on Nov 12, 2024 at 16:55 UTC | |
by hippo (Archbishop) on Nov 12, 2024 at 13:34 UTC |