If
you have a question on how to do something in Perl, or
you need a Perl solution to an actual real-life problem, or
you're unsure why something you've tried just isn't working...
then this section is the place to ask.
However, you might consider asking in the chatterbox first (if you're a
registered user). The response time tends to be quicker, and if it turns
out that the problem/solutions are too much for the cb to handle, the
kind monks will be sure to direct you here.
Hello everyone
I've written a subroutine that replaces a handful of Windows-1252 characters within a UTF8 string with HTML entities. Basically changing ' … ' to ' … '. This code works fine on its own outside of a function, but the regex fails within if( $strng =~ /\xE2/ ) { it doesn't "see" \xE2. Something is changing in $clmnVal once it's passed to the subroutine and I don't understand why or how to fix it.
Thanks, Tux, for Text:CSV. It keeps coming in handy both directly and indirectly.
I have a question about escaped quotes in incoming files. I would have expected the output for the three data lines below to be identical, but the second line misses the escaped single quote. Is an escaped single quote some kind of faux pax or even a mistake within the data or would it be one in something which Text::CSV draws on?
#!/usr/bin/perl
use Text::CSV qw(csv);
use strict;
use warnings;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
my $o = \*STDOUT;
# Read/parse CSV
while (my $row = $csv->getline (\*DATA)) {
my @selected = ( splice(@{$row}, 0, 2), splice(@{$row}, -6, 2) );
$csv->say($o, \@selected);
}
exit(0);
__DATA__
8, 9, NULL, 'Filler', '555-999', '77:88', 0, 0, 0, 0
8, 9, NULL, 'Filler', '555-999', '77:88', 'A \' B , C', 0, 0, 0
8, 9, NULL, 'Filler', '555-999', '77:88', 0, 0, 0, 0
which is similar to my application requirements. I can use DBI of course to connect to db1. And make ANOTHER connection to db2. But then I need to "prepare" - which needs a database connection. How can I create a handle connected to TWO databases as this statement requires? Maybe:
I would like to pass a list of HTML::Element objects as a return value from a subroutine. I thought since the list would be just a list, that the normal approach would apply. Clearly I am misunderstanding something, or perhaps there is a more appropriate approach:
#!/usr/bin/perl
use HTML::TreeBuilder::XPath;
use HTML::Element;
use HTML::Entities qw(decode_entities);
use Data::Dumper;
use strict;
use warnings;
my $html = &layer(3);
print $html->as_XML,"\n";
exit(0);
sub layer {
my ($layer) = (@_);
my $ul = HTML::Element->new('ul');
my $li = HTML::Element->new('li');
$li->push_content("Layer $layer");
$ul->push_content($li);
if($layer--) {
my $h = &layer($layer);
my @c = &unescape_entities($h); # offending line
print "Wrong structure:\n",Dumper(@c),"\n ----\n";
exit(1);
$ul->push_content($h);
} else {
my $foo = ' foo < bar';
my $literal = HTML::Element->new('~literal', text=>$foo);
$li->push_content($literal);
$ul->push_content($li);
}
return($ul);
}
sub unescape_entities {
my ($html) = (@_);
my $tmp = HTML::TreeBuilder::XPath->new;
$tmp->parse(decode_entities($html->as_XML));
my @c = $tmp->findnodes('//body/*');
print "Right structure:\n", Dumper(@c),"\n ----\n";
$tmp->delete;
return(@c); # this is getting transformed
}
I would expect that the script (minus the exit) to produce a nested HTML unordered list. What happens is that the variable actually recovered from the subroutine is basically empty, as seen by comparing the output from the two Dump calls.
I've looked at the manual pages for the modules given above as well as the one for perlrref. Please nudge me in the right direction.
I am writing some tests of a module that fetches a webpage using HTTP::Tiny->get
To test it, I am trying to use Test::Mock::HTTP::Tiny but I've never tried to use a Test::Mock module before. It's been on my radar since kcott mentioned their existence many moons ago. Now I have the need for one...but the documentation is lacking (to put it mildly!)
First I've run this code to get the mock data:
use strict;
use warnings;
use HTTP::Tiny;
use Test::Mock::HTTP::Tiny;
my $http = HTTP::Tiny->new;
my $resp = $http->get('http://www.way-finder.uk/');
open my $fh, '>', 'mock_html.dat';
print $fh Test::Mock::HTTP::Tiny->captured_data_dump;
close $fh;
Then I've renamed all the references to the domain www.way-finder.uk (a real domain) to www.testing.crawl (a mock domain). I've done this because I don't want the tests going out to a live site as it will change over time invalidating the tests.
My test file looks like this:
use strict;
use warnings;
use Test::More;
use Test::Mock::HTTP::Tiny;
use WWW::Crawl;
plan tests => 1;
$/ = undef;
open my $fh, '<', 't/mock_html.dat' or die "Can't open datafile";
my $replay = <$fh>;
close $fh;
die "Nothing to replay" unless $replay;
Test::Mock::HTTP::Tiny->set_mocked_data($replay);
my $crawl = WWW::Crawl->new(
'timestamp' => 'a',
);
my @links = $crawl->crawl('https://www.testing.crawl', \&link);
cmp_ok ( scalar @links, '==', 8, 'Correct link count');
sub link {
diag ($_[0]);
}
Is there an easy way to find out when a feature or keyword was introduced to Perl?
In other words, which version introduced it...
I have a module that uses pos, a keyword I infrequently use. How can I find out when this, or any other keyword, was introduced so I can set the minimum required Perl version?
Obviously I could go through each delta until I find it but that seems rather tedious...
Emacs comes with two different major modes to edit Perl code: perl-mode and cperl-mode.
perl-mode is somewhat stuck with the Perl syntax of 5.14, has less features, but a cleaner implementation. cperl-mode is up to date with Perl 5.38 and has deeper understanding of Perl syntax, but a somewhat arcane implementation, most of it written in the previous century.
With all due respect to TIMTOWTDI, maintaining two major modes turns out to be not enough fun in the long run, and last week Stefan Kangas opened a wishlist item to Making perl-mode.el obsolete.
The mail thread shows that some people prefer perl-mode because it is less "colorful" and intrusive than cperl-mode. Therefore, the idea is to enable cperl-mode to (optionally) look like and behave like perl-mode. That way, perl-mode.el can be obsoleted without making those users uncomfortable: perl-mode would continue to exist as a custom theme of cperl-mode.
Users of perl-mode are now encouraged to try cperl-mode, and to report bugs against cperl-mode where they prefer the behavior of perl-mode. This has already started. The "current" cperl-mode.el is available from the repository: cperl-mode.el and is supposed to work with Emacs 26 and newer.
I very much appreciate the thorough answers I receive here: they have helped me get reacquainted with Perl. The hardest part I still encounter is to understand what the symbols mean, not so much the concepts.
I'm simply asking for a "better" way to accomplish a task I already can do.
An example: I want to know if there's a way to send just one key & value per array/hash - preferably by reference - to a subroutine. In the example code below, I want to send all and only the {Sub_Name}s and their respective values. The code below works and I've come up with two ways to list the Sub_Name. I'm simply wanting to know if there's a better way to do it, so I don't have to make an array or send the entire array of hashes to the subroutine.
#!/usr/bin/env perl
#use 5.36.1;
use strict;
use warnings;
use Data::Dumper;
use autodie;
use File::Find;
use File::Copy;
use File::Rename;
use feature 'fc';
use File::Path qw( make_path );
my $Wait_Time = 10; # Implement Time Delay
# Subscription DATA sets
my @Subscription = (
{
Sub_Name => "Morph",
Archive_File => "Morph Archive.txt",
},
{
Sub_Name => "Analogue",
Archive_File => "Analogue Archive.txt",
},
{
Sub_Name => "Cat",
Archive_File => "Cat Archive.txt",
},
{
Sub_Name => "Zoonotic",
Archive_File => "Zoonotic Archive.txt",
},
{
Sub_Name => "Hydro",
Archive_File => "Hydro Archive.txt",
},
);
#Subscription of Names
my @Subscription_Names_List;
for (@Subscription) {push @Subscription_Names_List, $_->{Sub_Name};}
List_Subscriptions(\@Subscription_Names_List);
for (@Subscription) {
Display_Subs(\%$_);
}
sub Display_Subs {
my ($my_Sub) = @_;
print "Testing Subscription to: $my_Sub->{Sub_Name}\n";
return $my_Sub;
}
sub List_Subscriptions {
my (@Sub_ARRAY) = @{$_[0]};
# my (@Sub_ARRAY) = @_;
print "The Current Subscription List:\n\n";
for (@Sub_ARRAY) { print " \t$_ \n"; }
print "\n\n";
}
Is it better to use a "packaged" easily installable version of Perl such as Strawberry on a Windows OS or to compile a version or there is no practical difference, etc...?
I primarily ask for two reasons: stability and speed. An underlying assumption to both reasons is that compiling code will always be faster, because it will compile based on specific chipsets and its respective native math libraries.
1. Stability. My systems run both AMD CPUs and GPUs. I have wondered if instability in GAMES are sometimes caused by various issues of the binaries being compiled for Intel CPUs and NVidia GPUs. By extension, is there an inherent stability benefit of compiling Perl on a Windows system?
2. Speed. This likely has no real practical value unless one is running a "large" server on Windows; but, I wanted to ask anyway. So, by a similar analogy to GAMES stability, since there is a real benefit to GAMES which have been optimized for either AMD or Intel/NVidia and sometimes specifically a chipset, is there a speed benefit to compiling Perl on Windows?
The was the SO question recently, and as it sometimes happens, when I think "oh, this can be fun to play with, for better algorithm", it brewed for itself somewhere in subconscious, until the eureka moment a couple days ago (I'm in no hurry :-)): "Of course! Be greedy, demand twice as needed!"
There was advice to use brute force, at least as accepted answer in linked question (from 2021) there. Not sure if I got it right, I don't read Ruby, and didn't try other answers. Both brute subroutines below aren't actually used: they are totally unusable for lists with ~15 unique strings or more, plus any decent amount of duplicates. I wrote 2nd one because I couldn't believe it at first (accepted solution??). They are left just in case anyone wants to try (or point at my mistakes in implementation?), and can be ignored.
Back to answers at SO, there are 2 Perl solutions. One (a) doesn't compile; (b) if fixed, emits a warning for un-initialized value; (c) if fixed (or ignored), it seems to work OK. But for (corner-case, of course) input of (b,a,a), it gives answer (b,a). I didn't look further.
Another solution (by esteemed monk) fails randomly for e.g. corner-case (a,a,a,b,b) -- the only answer can be (a,b,a,b,a), of course. Why does it fail? Output list is initialized to e.g. (a,b). If 1st key to iterate is "a", then one of 2 remaining "a"'s is added to give (a,b,a) with no place for 2nd remaining "a". So, easy fix would be kind of "breadth-first" hash consumption. I'm sorry if code I had to add looks ugly to the author.
This fixed version will serve as reference to compare my solution to, it generates truly random lists.
With algorithm I suggest -- ask for twice as many random indexes from remaining pool, then simply reject (comb out) half of them. It guarantees there will be no consecutive dupes (and of course doesn't mean "only odd or even indexes for this value").
One obvious compromise on randomness will be "dupes are never placed at both head and tail" -- except corner-cases such as 'aba' or 'abaca', of course. There are actually 3 cases, depending on size of remaining pool. Cases "2" and "3" restrict randomness further. E.g., for 'aaaabbbcc', the 'c' is never placed at indexes 0 or 1 -- unlike the "reference SO implementation with true randomness".
However, lines with "die" in them can be un-commented (and they were un-commented during benchmarking) if input is not an artificial corner-case -- this code is never reached with realistic data. I mean, other than corner-cases and head/tail restriction, my algorithm seems to produce random enough result.
(In fact, one of "requests" of "RFC" is how to estimate randomness (entropy) for multiple runs of subroutine. Didn't look into that yet.)
Further "requests" are: can it be improved? Both List::MoreUtils::samples and e.g. (unused) Math::Prime::Util::randperm return their result shuffled, which I don't need and have to sort back to order! And more, e.g. samples takes random samples and therefore should know which items were unselected, but I have no better way to find out "which" except with more work using singleton. It feels like huge amount of unnecessary work I do (though it's still much faster than "SO reference solution"). Or maybe, perhaps, someone would suggest even faster solution?
(+ I understand there's sloppiness on my side in e.g. $uniq variable name doesn't actually mean number of unique items which fake_data returns. I hope this (and similar) can be forgiven.)
use strict;
use warnings;
use feature 'say';
use List::Util qw/ shuffle /;
use List::MoreUtils qw/ part samples singleton /;
use ntheory qw/ forperm lastfor /;
use Algorithm::Combinatorics qw/ permutations /;
use Benchmark 'cmpthese';
my @input = shuffle( qw( a a a a b b b c c )); # corner-cases
@input = shuffle( qw( a a a b b )); #
srand 123;
@input = fake_data( 555, 55 );
#say scalar @input; # 2096
sub fake_data {
my ( $uniq, $pivot ) = @_;
my @tmp = map { sprintf '< %06d >', rand 1e9 } 0 ... $uniq;
my @out;
push @out, @tmp[ 0 .. $_ ] for 0 .. $pivot;
@out = shuffle( @out, @tmp[ $pivot + 1 .. $uniq ]);
return @out
}
cmpthese 10, {
SO_fixed => sub { die unless SO_fixed( \@input )},
my_shuffle => sub { die unless my_shuffle( \@input )},
};
sub brute {
my $input_ref = shift;
my @output;
forperm {
my $prev = '';
for ( @_ ) {
return if $prev eq $input_ref-> [ $_ ];
$prev = $input_ref-> [ $_ ]
}
@output = @{ $input_ref }[ @_ ];
lastfor, return
} @$input_ref;
return \@output
}
sub brute2 {
my $input_ref = shift;
my @output;
my $iter = permutations( $input_ref );
PERM: while ( my $p = $iter-> next ) {
my $prev = '';
for ( @$p ) {
next PERM if $prev eq $_;
$prev = $_
}
@output = @$p;
last PERM
}
return \@output
}
sub SO_fixed {
my $input_ref = shift;
my %counts; ++$counts{ $_ } for @$input_ref;
my @strings = shuffle keys %counts;
LOOP: {
my $any = 0;
for my $string ( keys( %counts ) ) {
next if $counts{ $string } == 1;
$counts{ $string } --;
$any = 1;
my @safe =
grep { $_ == 0 || $strings[ $_ - 1 ] ne $string
+ }
grep { $_ == @strings || $strings[ $_ ] ne $string
+ }
0 .. @strings;
return undef unless @safe;
my $pick = $safe[ rand( @safe ) ];
splice( @strings, $pick, 0, $string );
}
redo LOOP if $any
}
return \@strings
}
sub my_shuffle {
my $input_ref = shift;
my @output;
my %counts; $counts{ $_ } ++ for @$input_ref;
my ( $single, $multi ) = part { $counts{ $_ } > 1 } keys %counts;
my @multi = sort { $counts{ $b } <=> $counts{ $a }} @$multi;
my @pool = ( 0 .. $#$input_ref );
for my $str ( @multi ) {
my $count = $counts{ $str };
my @take;
if ( $count <= @pool / 2 ) {
# case 1
my @excess = sort { $a <=> $b } samples( 2 * $count, @pool
+ );
my $n = int rand 2;
my @idx = grep { $n ^ $_ % 2 } 0 .. $#excess;
@take = @excess[ @idx ];
}
elsif ( 2 * $count - 1 == @pool ) {
# case 2
#die 'This code is unreachable for realistic input';
my @idx = grep { not $_ % 2 } 0 .. $#pool;
@take = @pool[ @idx ];
}
else {
# case 3
#die 'This code is unreachable for realistic input';
my $prev = -2;
my @ok = grep {
my $res = $_ - $prev;
$prev = $_;
$res > 1
} @pool;
return undef if $count > @ok;
@take = samples( $count, @ok );
}
@pool = singleton @pool, @take;
@output[ $_ ] = $str for @take;
}
@output[ @pool ] = @$single if @pool;
return \@output;
}
__END__
(warning: too few iterations for a reliable count)
Rate SO_fixed my_shuffle
SO_fixed 2.29/s -- -95%
my_shuffle 42.7/s 1763% --
Snippets of code should be wrapped in
<code> tags not<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).