Seekers of Perl Wisdom

If you have a question on how to do something in Perl, or you need a Perl solution to an actual real-life problem, or you're unsure why something you've tried just isn't working... then this section is the place to ask.

However, you might consider asking in the chatterbox first (if you're a registered user). The response time tends to be quicker, and if it turns out that the problem/solutions are too much for the cb to handle, the kind monks will be sure to direct you here.

User Questions

wrap abbreviations in XML element
1 direct reply — Read more / Contribute by LexPl
on May 16, 2025 at 05:06

I have got a highly complex, nested XML document in encoding ISO-8859-1 which contains abbreviations.

Each abbreviation has two to three letters and each letter is directly followed by a full stop. The separator between each letter plus full stop might be

!!!emsp14;
!!!hairsp;
\s
nothing!

You could define this as a regex: a-zA-Z\.((!!!emsp14;|!!!hairsp;|\s)?a-zA-Z)+

I would like to wrap each abbreviation into an element <abbrev> and unify the separator whitespace to "!!!hairsp;"

This looks pretty easy, but there are some nasty pitfalls:
If two abbreviations are adjacent to each other, the problem of proper segmentation pops up. For example the string "a. A. z. B." could lead to <abbrev>a.!!!hairsp;A.!!!hairsp;z.</abbrev> which doesn't exist. The correct solution would be <abbrev>a.!!!hairsp;A.</abbrev>_<abbrev>z.!!!hairsp;B.</abbrev> where the underscore stands for a space.

Another issue is the full stop at the end of a sentence and a following abbreviation:
"Hier müssen die richtigen Regeln einbezogen werden. Z.B. ist hier § 42 ...". Of course, there exists no abbreviation "n. Z.B.", but the proper tagging would be: "Hier müssen die richtigen Regeln einbezogen werden. <abbrev>Z.!!!hairsp;B.</abbrev> ist hier § 42 ...".

As the regex captures abbreviations with 2 letters and with 3 letters, it has to be taken care that a 3 letter abbreviation such as "m.w.N." won't be split into a two letter abbreviation "m.w." followed by "N."

I suppose that you will need a kind of knowledge base in your script for the proper segmentation, but I don't know how to do that.

The easy solution would be a bunch of changes:

!/usr/bin/perl
use warnings;
use strict;


# for interactive mode
my $infile = $ARGV[0];
my $outfile = $ARGV[1];  

open(IN, '<' .  $infile) or die $!;
open(OUT, '>' . $outfile) or die $!;

while(<IN>)
{
    
    # wrap "a.A."
    $_ =~ s[a\.!!!hairsp;A\.](<abbrev n='2'>a.!!!hairsp;A.</abbrev>)g;
    $_ =~ s[a\.!!!emsp14;A\.](<abbrev n='2'>a.!!!hairsp;A.</abbrev>)g;
+   
    $_ =~ s[a\.\sA\.](<abbrev n='2'>a.!!!hairsp;A.</abbrev>)g;
    $_ =~ s[a\.A\.](<abbrev n='2'>a.!!!hairsp;A.</abbrev>)g;
    # wrap "a.F."
    $_ =~ s[a\.!!!hairsp;F\.](<abbrev n='2'>a.!!!hairsp;F.</abbrev>)g;
    $_ =~ s[a\.!!!emsp14;F\.](<abbrev n='2'>a.!!!hairsp;F.</abbrev>)g;
+   
    $_ =~ s[a\.\sF\.](<abbrev n='2'>a.!!!hairsp;F.</abbrev>)g;
    $_ =~ s[a\.F\.](<abbrev n='2'>a.!!!hairsp;F.</abbrev>)g;
    # wrap "d.h."
    $_ =~ s[d\.!!!hairsp;h\.](<abbrev n='2'>d.!!!hairsp;h.</abbrev>)g;
    $_ =~ s[d\.!!!emsp14;h\.](<abbrev n='2'>d.!!!hairsp;h.</abbrev>)g;
+   
    $_ =~ s[d\.\sh\.](<abbrev n='2'>d.!!!hairsp;h.</abbrev>)g;
    $_ =~ s[d\.h\.](<abbrev n='2'>d.!!!hairsp;h.</abbrev>)g;
    # wrap "D.h."
    $_ =~ s[D\.!!!hairsp;h\.](<abbrev n='2'>D.!!!hairsp;h.</abbrev>)g;
    $_ =~ s[D\.!!!emsp14;h\.](<abbrev n='2'>D.!!!hairsp;h.</abbrev>)g;
+   
    $_ =~ s[D\.\sh\.](<abbrev n='2'>D.!!!hairsp;h.</abbrev>)g;
    $_ =~ s[D\.h\.](<abbrev n='2'>D.!!!hairsp;h.</abbrev>)g;
    
    print OUT $_;
}

close(IN);
close(OUT);
[download]

Do you see a more efficient solution? And if yes, could you kindly show me how this would look like?

Perl Best Practices -- 20 years later
4 direct replies — Read more / Contribute by pfaut
on May 13, 2025 at 07:26

I've been browsing through the monastery using Random Node. I keep coming across references to Perl::Critic. I downloaded it and ran it against some of my perl code.

Perl::Critic seems to base most if its policies on recommendations from Perl Best Practices. I was intrigued as to the rationale behind many of the policies. There is some documentation within the policy files themselves but I thought it might be useful to have the book. I searched for the book and found it was written in 2005 and hasn't been updated.

How relevant is this book 20 years later? The current version at the time of publication was 5.8. Perl has seen many changes over the last 20 years. Do the book's recommendations still apply? Is there much that's outdated due to new features in the language?

I'll probably buy the book anyway since it does appear to contain a lot of wisdom. I'm just curious how much material would need rewriting if a new edition were to be published today based on perl 5.40.

90% of every Perl application is already written. ⇒

dragonchild

Basic question about Iterator code
2 direct replies — Read more / Contribute by adamsj
on May 12, 2025 at 10:11

Higher Order Perl

# This software is Copyright 2005 by Elsevier Inc.  You may use it
# under the terms of the license at http://perl.plover.com/hop/LICENSE
+.txt .
###
### upto
###

## Chapter 4 section 2.1

sub upto {
  my ($m, $n) = @_;
  return sub {
    return $m <= $n ? $m++ : undef;
  };
}
my $it = upto(3,5);
my $value->it();
print "$value\n";
[download]

sub upto {
  my ($m, $n) = @_;
  return Iterator {
    return $m <= $n ? $m++ : undef;
  };
}
sub Iterator (&) { return $_[0]; }
[download]

Undefined subroutine &main::3 called

They laughed at Joan of Arc, but she went right ahead and built it. --Gracie Allen

Applying logical operators in either complex SQL xor hashes?
3 direct replies — Read more / Contribute by mldvx4
on May 10, 2025 at 08:01

Greetings, PerlMonks,

Thanks for DBI. It works very well with SQLite3. I have a short (for now) Perl script which is accessed via CGI::Fast which will soon have a small but varying number of SQL queries for SQLite3. The queries are targeted at various FTS5 tables. The number of queries is between 1 and n where n is a "small" number, but in any case changing each time. They are executed theough prepare and execute statements. The results from each query are collected by the Perl script into its own hash with the records' unique key (recno) as the hash key. For illustration, here is the basic sample query:

SELECT old_keys.recno AS recno, 
        'http://example.org/' || old_keys.file, 
        old_metadata.value AS title 
    FROM old_keys 
    JOIN old_metadata 
    ON old_keys.recno = old_metadata.recno 
    WHERE term='dc.title' AND old_metadata.recno 
    IN (
        SELECT rowid AS recno 
        FROM old_fts5_metadata 
        WHERE old_fts5_metadata MATCH ?);
[download]

When there is more than one query, there will be an operator XOR, AND, OR, or NOT applied between them.

My question is about the recommended approach, should I apply the operators to the various hashes after separate queries, or should I have the script build out more complex queries into a single SQL query for each search?

What's the best way to ask about code from an older book?
3 direct replies — Read more / Contribute by adamsj
on May 09, 2025 at 16:00

Higher Order Perl

They laughed at Joan of Arc, but she went right ahead and built it. --Gracie Allen

Strange Occurrence in Substitution Statement
1 direct reply — Read more / Contribute by roho
on May 06, 2025 at 14:50

"$extra"

The problem is the second and subsequent lines have the backtick removed (i.e., replaced with nothing), while the first line is processed as expected where the backtick is replaced by a single space.

It appears the substitution of "$extra" is taking out the backticks (for data lines 2 and following) before the following substitution replaces backticks with a single space. It's a mystery why only the first data line is processed correctly.

#!/usr/bin/perl
use strict;
use warnings;

while ( <DATA> ) {
    chomp;
    my $fname = $_;
    print qq(\nBefore: $fname\n);
    my $extra = '';
    $fname =~ s/$extra//;
    $fname =~ s/`/ /;
    print qq( After: $fname\n);
}

exit;

__DATA__
2025-05-05`09:22:00            7,674  -rw-rw-rw-  C:\~Zipfile\H\Helpin
+gHand\Software\DevTools\.edit.current
2025-05-05`09:22:00            7,674  -rw-rw-rw-  C:\PerlApps\H\Helpin
+gHand\Software\DevTools\.edit.current
2025-05-05`09:22:00            5,448  -rw-rw-rw-  C:\~Zipfile\B\Bat\pi
+cs\.edit.current
[download]

"It's not how hard you work, it's how much you get done."

Resource Not Found error
1 direct reply — Read more / Contribute by StorminN61
on May 05, 2025 at 13:48

Recently copied a Perl script to a new server since old one is no longer in compliance. When I run the script on the old server, it creates a text file, aka trigger file, which is used by IBM Workload Scheduler to start processing a particular job. Depending on the trigger file, IWS determines which job should run. I copied the directory structure on the new server, made the IIS entries on the new server match what is on the old server, but when I try to create the triggers, I get the 404 Resource not found error, with no further information as to what resource it is specifically missing.

#!/usr/bin/perl -w

use CGI;

print "Content-type: text/html\n\n";
print "<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"tex
+t/html; charset=iso-8859-1\" />\n";
print "<link rel=\"stylesheet\" href=\"http://pctldocs/doc_style.css\"
+ type=\"text/css\" />\n";
print "<title>Trigger Result</title>\n</head>\n";
print "<body>\n";


print "Result:<br />";
my @query = split(/=/, $ENV{'QUERY_STRING'});
my $term = $query[$#query];
my $file = 'd:\\apps\\AETrigger\\' . $term . '.txt';
if (open (FH, '>', $file)) {
    print "Trigger for $term ($query[0]) has been created.";
} else {
    print "Error creating trigger; please contact support.";
}
print "</body>\n</html>";
[download]

Cannot Remove Redefined Warnings
4 direct replies — Read more / Contribute by Sukhster
on May 05, 2025 at 11:41

I have recompiled Perl from Source (v5.40.2) with DBI (DBI-1.647.tar.gz) and DBD-Oracle (DBD-Oracle-1.90.tar.gz) - and am now getting redefined errors for the first time.

Was previously on v5.38.0 with DBI (DBI-1.643.tar.gz) and DBD-Oracle (DBD-Oracle-1.83.tar.gz) and didn't face this issue.

I would like to keep warning - and either resolve these warnings, or remove them.

I have tried the following, but to no avail. I still get the defined warnings.

use warnings qw(-refine);
no warnings qw(redefine);
no warnings 'redefine';

Any advice, Ye Great Monks of Perl?

########################
# Declare Modules
########################
use strict;
use warnings;
# Other Modules
use POSIX qw(strftime);
use Time::HiRes qw(time);
use Time::Piece;
use Time::Seconds;
no warnings 'redefine';
use DBI;
use DBD::Oracle qw(:ora_types :ora_fetch_orient :ora_exe_modes);

. . . .

sub connect_to_database($$$)
{
        # Declare the variables
        my ($db_uid, $db_pwd, $db_sid) = @_;
        my $dbh;
        my %attribs = (
                PrintError => 0,
                AutoCommit => 0,
                RaiseError => 0
                );

        $dbh = DBI->connect("DBI:Oracle:".$db_sid, $db_uid, $db_pwd , 
+\%attribs )
                or  die "ERROR: Can't connect to database ($db_uid\@$d
+b_sid): ".$DBI::errstr."\n";

        return $dbh;
}
[download]

Subroutine DBI::db::ora_lob_read redefined at /applications/app12345/A
+PP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DBI
+.pm line 1398.
Subroutine DBI::db::ora_lob_write redefined at /applications/app12345/
+APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DB
+I.pm line 1398.
Subroutine DBI::db::ora_lob_append redefined at /applications/app12345
+/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/D
+BI.pm line 1398.
Subroutine DBI::db::ora_lob_trim redefined at /applications/app12345/A
+PP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DBI
+.pm line 1398.
Subroutine DBI::db::ora_lob_length redefined at /applications/app12345
+/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/D
+BI.pm line 1398.
Subroutine DBI::db::ora_lob_chunk_size redefined at /applications/app1
+2345/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-mul
+ti/DBI.pm line 1398.
Subroutine DBI::db::ora_lob_is_init redefined at /applications/app1234
+5/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/
+DBI.pm line 1398.
Subroutine DBI::db::ora_nls_parameters redefined at /applications/app1
+2345/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-mul
+ti/DBI.pm line 1398.
Subroutine DBI::db::ora_can_unicode redefined at /applications/app1234
+5/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/
+DBI.pm line 1398.
Subroutine DBI::db::ora_can_taf redefined at /applications/app12345/AP
+P_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DBI.
+pm line 1398.
Subroutine DBI::db::ora_db_startup redefined at /applications/app12345
+/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/D
+BI.pm line 1398.
Subroutine DBI::db::ora_db_shutdown redefined at /applications/app1234
+5/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/
+DBI.pm line 1398.
Subroutine DBI::st::ora_fetch_scroll redefined at /applications/app123
+45/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi
+/DBI.pm line 1398.
Subroutine DBI::st::ora_scroll_position redefined at /applications/app
+12345/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-mu
+lti/DBI.pm line 1398.
Subroutine DBI::st::ora_ping redefined at /applications/app12345/APP_H
+OME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DBI.pm 
+line 1398.
Subroutine DBI::st::ora_stmt_type_name redefined at /applications/app1
+2345/APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-mul
+ti/DBI.pm line 1398.
Subroutine DBI::st::ora_stmt_type redefined at /applications/app12345/
+APP_HOME/apps/perl5/lib/site_perl/5.40.2/x86_64-linux-thread-multi/DB
+I.pm line 1398.
[download]

cpanplus test suite seems to be a bit ruinous
1 direct reply — Read more / Contribute by Intrepid
on May 03, 2025 at 17:01

The Seekers of Perl Wisdom may not be the best place in the known universe for a bug report, but I know the maintainer of the module I am struggling with, bingos, is active here. So here goes.

The test suite for cpanplus is the creature we need to tame.

I am working with cpanplus v0.9916 and on CygwinPerl v5.40.2

First off, Params::Validate reports an error, exact text shown below, in Dist/MM.pm:

Key 'dir' (t/20_CPANPLUS-Dist-MM.t) is of invalid type for 'CPANPLUS::Internals::Utils::_chdir' provided by CPANPLUS::Dist::MM::create at /cygdrive/c/Users/somia/AppData/Local/.cpanp/.cpanplus/5.40.2/build/nDoXKs5RLj/CPANPLUS-0.9916/t/../lib/CPANPLUS/Dist/MM.pm line 610.

I inserted some dumb simple code to check what is being passed, the output is:

----------------------------------------------------------------------------
Args being passed to Params::Validate: dir|t/20_CPANPLUS-Dist-MM.t
----------------------------------------------------------------------------

I will provide a plate of Buffalo chicken wings to the monk or nun that can work out why that error is appearing. ;-) I've looked and looked and cannot figure it out.

The next matter is equally baffling. I have DBD::SQLite installed to my system but the test suite reports skipping test because "SQLite engine not available":

t/031_CPANPLUS-Internals-Source-SQLite.t ...... skipped: SQLite engine not available
t/032_CPANPLUS-Internals-Source-via-sqlite.t .. skipped: SQLite engine not available

Lastly, the tests in the suite themselves seem to be out of order somehow:

Test Summary Report
-------------------
t/20_CPANPLUS-Dist-MM.t                     (Wstat: 0 Tests: 87 Failed: 77)
  Failed tests:  1, 1, 1, 1, 11-83
  Parse errors: Plan (1..1) must be at the beginning or end of the TAP output
                Tests out of sequence.  Found (1) but expected (11)
                Tests out of sequence.  Found (11) but expected (12)
                Tests out of sequence.  Found (12) but expected (13)
                Tests out of sequence.  Found (13) but expected (14)
Displayed the first 5 of 83 TAP syntax errors.
Re-run prove with the -p option to see them all.
Files=20, Tests=1712, 82 wallclock secs ( 0.34 usr  0.47 sys + 20.69 cusr 50.89 csys = 72.40 CPU)
Result: FAIL
Failed 1/20 test programs. 77/1712 subtests failed.

Anyone able to provide suggestions for how to seek and destroy these annoying errors will have my sincere gratitude.

A just machine to make big decisions
Programmed by fellows (and gals) with compassion and vision
We'll be clean when their work is done
We'll be eternally free yes, and eternally young
Donald Fagen —> I.G.Y.
(Slightly modified for inclusiveness)

Regex for hostname validation
2 direct replies — Read more / Contribute by hrcerq
on May 01, 2025 at 23:45

Hello again.

I always used naive regexps for hostname validation. But recently I've been trying to build something more robust and more adherent to related RFCs.

Mostly, I've consulted the following RFCs:

From that I understand that:

Hostnames might be composed by 1 or more labels (separated by dots)
Each label may have at most 63 characteres
Regardless of how many labels there are, it may be at most 255 characters long
Each label may contain a combination of letters, numbers and hyphens
No label may begin or end with a hyphen
Hostnames can't be composed only by numbers

If the hostname is qualified (i.e. there are at least 2 labels), then:

There may be 2 or more labels
Last label is a TLD
TLDs must not be 1 character long or composed only by numbers
A trailing dot may be present

BTW, consulting RFCs sometimes feels like walking a complex maze full of hidden traps, because there's always some obscure detail you might overlook.

Things get worse if we consider some hostnames in the wild not adherent to these rules (e.g. some use underscores, which is valid for DNS, but not when used in hostnames), and also that there exist internationalized domain names.

I've tested my regex, but chances are, there are corner cases I'm not aware of, so maybe anyone you might help me find such cases.

This is how I'm doing:

my $hname_re = qr/
    ^ (?=(?&validchar){1,255}$) (?!\d+$)
        (?&label)
        (?: (?:\.(?&label))* \.(?&tld) \.? )?
    $
    (?(DEFINE)
        (?<validchar>[a-zA-Z0-9.-])
        (?<alnum>[a-zA-Z0-9])
        (?<alnumdash>[a-z-A-Z0-9])

        (?<label>(?> (?&alnum)
                (?: (?&alnumdash){,61}
                    (?&alnum) )? ) )

        (?<tld>(?!(\d+|.)\.?$) (?&label) )
    )
/x;
[download]

Thanks for any suggestions.

return on_success() or die;

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`