comment on

Thanks to your kind assistance I could get a working statistics tool :)

But when I apply the script listed below to another file, I get the following error which really puzzles me:

Use of uninitialized value in concatenation (.) or string at whitespace-stat.pl line 47, <$in> line 1 (#1)
(W uninitialized) An undefined value was used as if it were already defined. It was interpreted as a "" or a 0, but maybe it was a mistake. To suppress this warning assign a defined value to your variables.

To help you figure out what was undefined, perl will try to tell you the name of the variable (if any) that was undefined. In some cases it cannot do this, so it also tells you what operation you used the undefined value in. Note, however, that perl optimizes your program and the operation displayed in the warning may not necessarily appear literally in your program. For example, "that $foo" is usually optimized into "that " . $foo, and the warning will refer to the concatenation (.) operator, even though there is no . in your program.

#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;


#my personal data left out!

print "Generate statistics: Whitespace in context\n";

my $infile = $ARGV[0];

#define regexes as search target (in the array @regexes)
my @regexes = (qr/&sect;\s*[0-9]/, qr/Art\.\s*[0-9IVX]/, qr/Artikel\s*
+[0-9IVX]/, qr/Artikels\s*[0-9IVX]/, qr/Artikeln\s*[0-9IVX]/);

open my $in, '<', $infile or die "Cannot open $infile for reading: $!"
+;

#read input file in variable $xml
my $xml;
{
  local $/ = undef;
  $xml = <$in>;
}

#define array for frequency values
my @tally;

#count routine for each regex
for my $i (0 .. $#regexes) {
    my $regex = $regexes[$i];
    ++$tally[$i] while $xml =~ /$regex/g;
}    

#define output file
open my $out, '>', 'stats.txt' or die $!;


#output statistics
print {$out} "Statistics: Whitespace in context\n\ninput file: ";
print {$out} "$infile";
print {$out} "\n======================================================
+==================\n\n";

for my $i (0 .. $#regexes) {
    my $regex = $regexes[$i];
    $regex =~ s/^\(\?\^://;
    $regex =~ s/\)$//;
    print {$out} "$regex:\t\t$tally[$i]\n";
    
 }
 
close $in;
close $out;
[download]

In reply to Re^6: Entity statistics by LexPl
in thread Entity statistics by LexPl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.