Losing <br/> tags after parsing with HTML::TreeBuilder
1 direct reply — Read more / Contribute
|
by Markismus
on Jun 05, 2025 at 04:54
|
|
|
I am extracting keyword and definition pairs from a large html document with HTML::Treebuilder and HTML::Element.
After parsing the html string and dumping nodes, I find that the <br/> tag is missing. How could I prevent that>?
use HTML::TreeBuilder 5 -weak;
my $tree = HTML::TreeBuilder->new;
# HTML given is read from a file in UTF-8 format. Using parse_file
+ returns garbled characters.
my $html = shift;
# The next 2 lines are added to see whether it would impact losing
+ <br/> tags
$tree->no_space_compacting(1);
$tree->ignore_unknown(0);
$tree->parse( $html );
$tree->eof();
...
...
$Definition = $DefinitionNode->as_HTML('<>&');
Part of the HTML-string input:
<i>adv.</i><br/><b>1</b> <span lang="pt">maldosamente</span><br/><
+b>2</b> <span lang="pt">maliciosamente</span><br/><b>3</b> <span lang
+="pt">intencionalmente</span>
And the resulting output:
<i>adv.</i><b>1</b> <span lang="pt">maldosamente</span><b>2</b> <s
+pan lang="pt">maliciosamente</span><b>3</b> <span lang="pt">intencion
+almente</span>
|
Funny-business with Win32 extension module build
1 direct reply — Read more / Contribute
|
by Intrepid
on Jun 04, 2025 at 17:10
|
|
|
Under StrawberryPerl I'm having some difficulty building Win32::Exe and I'd like to get opinions from those more familiar with XS-using modules than I am.
Here's what the distribution package contains, in tree format, after I did a
'make clean'.
Win32-Exe-0.17-0/
|--Changes
|--MANIFEST
|--META.yml
|--Makefile.PL
|--Makefile.old
|--README
|--insert
| |--InsertResourceSection.xs
| |--Makefile.PL
| |--Makefile.old
| |--t
| | |--0-load.t
| | `--1-basic.t
| `--typemap
|--lib
| `--Win32
| |--Exe
| | |--Base.pm
| | |--DataDirectory.pm
| | |--DebugDirectory.pm
| | |--DebugTable.pm
| | |--IconFile.pm
| | |--InsertResourceSection.pm
| | |--Manifest
| | | `--Parser.pm
| | |--Manifest.pm
| | |--PE
| | | |--Header
| | | | |--PE32.pm
| | | | `--PE32Plus.pm
| | | `--Header.pm
| | |--PE.pm
| | |--Resource
| | | |--GroupIcon.pm
| | | |--Icon.pm
| | | |--Manifest.pm
| | | `--Version.pm
| | |--Resource.pm
| | |--ResourceData.pm
| | |--ResourceEntry
| | | |--Id.pm
| | | `--Name.pm
| | |--ResourceEntry.pm
| | |--ResourceTable.pm
| | |--Section
| | | |--Code.pm
| | | |--Data.pm
| | | |--Debug.pm
| | | |--Exports.pm
| | | |--Imports.pm
| | | `--Resources.pm
| | `--Section.pm
| `--Exe.pm
|--script
| `--exe_update.pl
`--t
|--0-pod.t
|--1-basic.t
|--2-icon.t
|--3-manifest.t
|--4-execupdate.t
|--application.xml
|--empty.xml
|--hd.ico
|--par.ico
|--winexe32.exe
`--winexe64.exe
The error in the make step is this:
gcc -c -std=c99 -DWIN32 -DWIN64 -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY
+ -DPERL_IMPLICIT_SYS -DUSE_PERLIO -D__USE_MINGW_ANSI_STDIO -fwrapv -f
+no-strict-aliasing -mms-bit fields -O2 -DVERSION=\"0.17\" -DXS_VERSIO
+N=\"0.17\" "-ID:\SBP\perl\lib\CORE" InsertResourceSection.c
gcc: fatal error: no input files
compilation terminated.
I'm trying not to post too much from the screen so I'll just show this context for the error:
Running Mkbootstrap for InsertResourceSection ()
"D:\SBP\perl\bin\perl.exe" -MExtUtils::Command -e chmod -- 644 "Insert
+ResourceSection.bs"
"D:\SBP\perl\bin\perl.exe" -MExtUtils::Command::MM -e cp_nonempty -- I
+nsertResourceSection.bs ..\blib\arch\auto\Win32\Exe\InsertResourceSec
+tion\InsertResourceSection.bs 644
"D:\SBP\perl\bin\perl.exe" "D:\SBP\perl\lib\ExtUtils/xsubpp" -typemap
+ D:\SBP\perl\lib\ExtUtils\typemap -typemap C:\Users\somia\build\straw
+berry-perl\Win32-Exe-0.17-0\insert\typemap InsertResourceSection.xs >
+ InsertResourceSection.xsc
"D:\SBP\perl\bin\perl.exe" -MExtUtils::Command -e mv -- InsertResource
+Section.xsc InsertResourceSection.c
So the obvious thing to do is to look at the file tree for InsertResourceSection.c and it's at
Win32-Exe-0.17-0/insert/InsertResourceSection.c. That's why I say there's "Funny business." ;-(
Jun 04, 2025 at 21:09 UTC
|
Why does CPANTS show PaxHeaders (no_pax_headers issue) but my tar does not see them?
No replies — Read more | Post response
|
by Darkwing
on Jun 04, 2025 at 10:52
|
|
|
Hi Monks,
looking on CPANTS (https://cpants.cpanauthors.org/) I noticed the issue "no_pax_headers" for a number of modules. I asked chatgpt and it told me that you can check a tarball for these headers as follows:
tar --list --verbose --file=My-Module-0.01.tar.gz | grep -i pax
Being curious, I tried this command with a number of modules that have this issue, but I never got an output. I also checked the output of tar -tvf My-Module-0.01.tar.gz and found no PaxHeaders. Btw, for a number of modules my tar (GNU tar 1.34 on linux) printed warnings such as:
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.provenance'
but not always, e.g. running the command for Inline-Python-0.58.tar.gz and Config-INI-RefVars-0.21.tar.gz did not produce such warnings. So these warnings seem not to be related to pax headers issue.
I do not have a problem with this on my machine, but I would like to understand what's going on: How is it that Kwalitee on CPANTS sees these PaxHeaders, but my tar does not?
|
It's Tuesday, so it must be the day for Trouble With Cpan
1 direct reply — Read more / Contribute
|
by Intrepid
on Jun 03, 2025 at 14:41
|
|
|
Hello kindly Perl votaries. I installed Strawberry Perl Portable edition to a USB drive yesterday, and today I began using it. I tried to install
a CPAN module I used in the course of trying to create a
CUFP (Almost cool: removable drive "finder" instead of windows autoplay). The module is Win32::DriveInfo. I had
installed that module a few days ago under CygwinPerl and the installation went fine.
Today when I tried to use my newly configured CPAN.pm under
portable Strawberry, I got the oddest failures. Here's what my console showed
me:
cpan[8]> install Win32::DriveInfo
Running install for module 'Win32::DriveInfo'
Fetching with HTTP::Tiny:
https://cpan.org/authors/id/M/MB/MBLAZ/Win32-DriveInfo-0.06.tar.gz
CPAN: Digest::SHA loaded ok (v6.04)
Fetching with HTTP::Tiny:
https://cpan.org/authors/id/M/MB/MBLAZ/CHECKSUMS
CPAN: Compress::Zlib loaded ok (v2.213)
Checksum for C:\Users\somia\AppData\strawberry-perl-sourcecache\author
+s\id\M\MB\MBLAZ\Win32-DriveInfo-0.06.tar.gz ok
'C:' is not recognized as an internal or external command,
operable program or batch file.
Uncompressed C:\Users\somia\AppData\strawberry-perl-sourcecache\author
+s\id\M\MB\MBLAZ\Win32-DriveInfo-0.06.tar.gz successfully
Using Tar:C:/ix/cygwin/bin/tar.exe xvf "Win32-DriveInfo-0.06.tar":
Win32-DriveInfo-0.06/
Win32-DriveInfo-0.06/Makefile.PL
Win32-DriveInfo-0.06/DriveInfo.pm
Win32-DriveInfo-0.06/Changes
Win32-DriveInfo-0.06/test.pl
Win32-DriveInfo-0.06/README
Win32-DriveInfo-0.06/MANIFEST
Untarred Win32-DriveInfo-0.06.tar successfully
CPAN: CPAN::Meta::Requirements loaded ok (v2.143)
CPAN: CPAN::Meta loaded ok (v2.150010)
Package contains both files[Win32-DriveInfo-0.06.tar] and directories[
+Win32-DriveInfo-0.06]; not recognized as a perl package, giving up
Configuring M/MB/MBLAZ/Win32-DriveInfo-0.06.tar.gz with Makefile.PL
Running make for M/MB/MBLAZ/Win32-DriveInfo-0.06.tar.gz
make: *** No targets specified and no makefile found. Stop.
MBLAZ/Win32-DriveInfo-0.06.tar.gz
C:/ix/cygwin/bin/make.exe -- NOT OK
Stopping: 'install' failed for 'Win32::DriveInfo'.
The same fatal error took place just now when I tried to build/install Win32::Env. Does anyone have any idea why I'd see this failure?
Jun 03, 2025 at 18:32 UTC
|
Introspection into floats/NV
2 direct replies — Read more / Contribute
|
by LanX
on Jun 03, 2025 at 11:05
|
|
|
$ perl
printf "%5s %.13a\n %s\n", $_, eval($_), unpack("B*",pack("F>",ev
+al $_)) for qw(1 -1 2 -2 1/3 -1/3)
__END__
1 0x1.0000000000000p+0
0011111111110000000000000000000000000000000000000000000000000000
-1 -0x1.0000000000000p+0
1011111111110000000000000000000000000000000000000000000000000000
2 0x1.0000000000000p+1
0100000000000000000000000000000000000000000000000000000000000000
-2 -0x1.0000000000000p+1
1100000000000000000000000000000000000000000000000000000000000000
1/3 0x1.5555555555555p-2
0011111111010101010101010101010101010101010101010101010101010101
-1/3 -0x1.5555555555555p-2
1011111111010101010101010101010101010101010101010101010101010101
2 questions:
- I wasn't able yet to make unpack "B1B11B52" work to distinguish between sign, exp and significant (mantissa), because of some byte alignment problems bits were missing at the end
- Does this work on other builds and platforms too? ubuntu 64bit perl intel processor
update
tested on my ARM mobile which has another endianess, and got the same result
|
integer pragma buggy? (ANSWERED)
No replies — Read more | Post response
|
by LanX
on Jun 02, 2025 at 18:14
|
|
|
(in continuation to Re^7: Largest integer in 64-bit perl (RFC))
According to integer it'll adjust the ** operation to integer, even on 64bit machines.
This seems to break as soon as the precision of floats at 53bits is exceeded°
Am I missing something?
$ perl
$\="\n";
print log(3**34)/log(2);
use integer;
print int(3**34);
__END__
53.8887250245193
16677181699666570
NB: a power to the basis 3 can't ever be even (odd*odd is odd) and the correct number is 16677181699666569 according to Math::BigInt ...
To answer my own question ...
... it's documented, doh.
> The power operator ** is also not affected,
I was confused by the fact that the synopsis explicitly shows the power operator.
> $a = 2**31 - 1; # Largest positive integer on 32-bit machines
Honestly I thing integer is far too incomplete to be of big use.
°) or more likely never have been properly implemented in the first place, because it was developped and tested on 32bit machines
|
Crash in stack_grow() with SQL::Abstract
1 direct reply — Read more / Contribute
|
by Casey2255
on Jun 02, 2025 at 15:57
|
|
|
Hi all.
I'm working with a fleet of devices running Perl on a ARM chipset (armhf) that are experiencing an infrequent, random crash in SQL::Abstract (via DBIx::Class).
I've not yet been able to reproduce it manually, our only clue to go off of has been the following log statement.
panic: stack_grow() negative count (-16777216) at /usr/share/perl5/SQL/Abstract.pm line 1493
I've found similar threads of https://rt-cpan.github.io/Public/Bug/Display/108578/ and https://github.com/Perl/perl5/issues/15013. But neither of which were fruitful.
My first observation was that -16777216 is 0xFF000000 as a 32-bit signed value. This makes me think it's due to a unsigned to signed cast gone wrong. However, adding 16 MiB to the stack seems incorrect as well. To me this screams memory corruption or an uninitialized value.
I ran it with the -d flag to get the following stack trace:
@ = SQL::Abstract::_join_sql_clauses(ref(DBIx::Class::SQLMaker), 'and'
+, ref(ARRAY), ref(ARRAY)) called from file '/usr/share/perl5/SQL/Abst
+ract.pm' line 678
@ = SQL::Abstract::_where_HASHREF(ref(DBIx::Class::SQLMaker), ref(HASH
+), undef) called from file '/usr/share/perl5/SQL/Abstract.pm' line 54
+5
@ = SQL::Abstract::_recurse_where(ref(DBIx::Class::SQLMaker), ref(HASH
+)) called from file '/usr/share/perl5/SQL/Abstract.pm' line 525
@ = SQL::Abstract::where(ref(DBIx::Class::SQLMaker), ref(HASH), ref(HA
+SH)) called from file '/usr/share/perl5/SQL/Abstract.pm' line 469
@ = SQL::Abstract::select(ref(DBIx::Class::SQLMaker), ref(ARRAY), 'me.
+id, me.name, me.value, me.modified_at', ref(HASH), ref(HASH)) called
+from file '/usr/share/perl5/DBIx/Class/SQLMaker.pm' line 172
@ = DBIx::Class::SQLMaker::select(ref(DBIx::Class::SQLMaker), ref(ARRA
+Y), ref(ARRAY), ref(HASH), ref(HASH)) called from file '/usr/share/pe
+rl5/DBIx/Class/Storage/DBI.pm' line 1679
@ = DBIx::Class::Storage::DBI::_gen_sql_bind(ref(DBIx::Class::Storage:
+:DBI::Pg), 'select', ref(ARRAY), ref(ARRAY)) called from file '/usr/s
+hare/perl5/DBIx/Class/Storage/DBI.pm' line 1666
@ = DBIx::Class::Storage::DBI::_prep_for_execute(ref(DBIx::Class::Stor
+age::DBI::Pg), 'select', ref(ARRAY), ref(ARRAY)) called from file '/u
+sr/share/perl5/DBIx/Class/Storage/DBI.pm' line 1810
@ = DBIx::Class::Storage::DBI::_execute(ref(DBIx::Class::Storage::DBI:
+:Pg), 'select', ref(ARRAY), ref(ARRAY), ref(HASH), ref(HASH)) called
+from file '/usr/share/perl5/DBIx/Class/Storage/DBI.pm' line 2409
@ = DBIx::Class::Storage::DBI::_select(ref(DBIx::Class::Storage::DBI::
+Pg), ref(ARRAY), ref(ARRAY), ref(HASH), ref(HASH)) called from file '
+/usr/share/perl5/DBIx/Class/Storage/DBI.pm' line 2586
@ = DBIx::Class::Storage::DBI::select_single(ref(DBIx::Class::Storage:
+:DBI::Pg), ref(ARRAY), ref(ARRAY), ref(HASH), ref(HASH)) called from
+file '/usr/share/perl5/DBIx/Class/ResultSet.pm' line 1104
This puts us at the following function and line of the crash:
https://github.com/dbsrgits/sql-abstract/blob/2972827e573b0217735b901088e69c994ba8d226/lib/SQL/Abstract.pm#L1493
sub _join_sql_clauses {
my ($self, $logic, $clauses_aref, $bind_aref) = @_;
if (@$clauses_aref > 1) {
my $join = " " . $self->_sqlcase($logic) . " ";
my $sql = '( ' . join($join, @$clauses_aref) . ' )';
return ($sql, @$bind_aref);
}
elsif (@$clauses_aref) {
##### CRASH TRIGGERED ON LINE 1493 BELOW #####
return ($clauses_aref->[0], @$bind_aref); # no parentheses
}
else {
return (); # if no SQL, ignore @$bind_aref
}
}
However, trying to add a breakpoint to stack_grow() fails. From further research this is an internal symbol, so does not have a subroutine to match.
My questions are:
- How does stack_grow get called when returning an array like in the above _join_sql_clauses?
- Is there a way to include internal symbol calls (stack_grow) in the Perl debugger? If not, how would you go about tracking that call?
- Does this indicate a memory corruption? If so, are there any Perl tools to help debug such an issue?
Versions:
Perl version: 5.32.1
DBIx::Class version: 0.082841
SQL::Abstract version: 1.87
I appreciate any and all feedback, thank you all in advance.
- Casey
|
Connecting to a database in AWS using SSL
2 direct replies — Read more / Contribute
|
by vitoco
on May 29, 2025 at 23:10
|
|
|
Hello. I built a small local database in Postgres and use some perl scripts to load and update data, and then to extract for some reports. My scripts use the standard DBI module:
use DBI;
$dbh = DBI->connect("dbi:Pg:dbname=mydb;host=localhost;port=5432;",
$username,
$password,
{AutoCommit => 0, RaiseError => 1, PrintError => 0
+}
);
Now, I have to move my local database model to an AmazonAWS Postgres database, which it is using an SSL tunnel (a second hostname and port). Also, as my localhost is outside of the HQ, I had to connect my box to there using a VPN in order to be able to access AWS.
I was given some credentials and a PEM file, so I can connect to that database server using the pgAdmin tool, but I cannot figure out how to implement that kind of connection in my scripts. I'm not sure if the sslmode connect string option is useful in this case...
Any idea on how to configure the connect string? Should I use another module as a wrapper?
Thanks...
|
Zipping the contents of a directory by filename
3 direct replies — Read more / Contribute
|
by justin423
on May 29, 2025 at 10:55
|
|
|
What am I missing? I am trying to zip a few hundred PDF's by the first 7 letters of the filename to make each zip a manageable size.
It is zipping all of them into just one file and I know it must be something simple that I am missing.
#!/usr/bin/perl
use IO::Compress::Zip qw(:all);
$path='/DATA/DOCUMENTS/';
opendir my $dh, $path;
my @files = readdir $dh;
foreach my $files (@files){
print "$files\n";
$zipfilename=substr($files,7);
$zipfilename1=$path.$zipfilename;
zip [ glob("$zipfilename1*.pdf") ] => "$zipfilename1.zip"
or die "Cannot create zip file: $ZipError" ;
}
closedir $dh;
|
File::XDG on varying platforms
2 direct replies — Read more / Contribute
|
by Intrepid
on May 26, 2025 at 14:13
|
|
|
This is going to seem like a very esoteric line of inquiry, I suspect, but I will
put it out here for the fine monks and nuns to give me feedback.
I use a computer with Windows 10 and with Cygwin installed (as I often mention on
PMo). I have this computer and 5 other computers to configure for file locations under
my home dir for the vim editor, meaning places to keep backup files and to keep swap
files. All those 5 systems run Debian or some derivative distro of Debian Gnu/Linux. So
things as they stand will be relatively predictable when I get around to configuring
them. Yes, I could do this "by hand" on each system, but automating such a repetitive
chore is exactly one of the compelling reasons Perl exists.
Here are the outputs of my script using File::XDG at this stage of writing
it (see the code below):
According to File::XDG on Cygwin, the following directories would be used for "vim":
Config for app vim is C:/Users/somia/config/vim
Data for app vim is C:/Users/somia/AppData/share/vim
Cache for app vim is C:/Users/somia/cache/vim
According to File::XDG on MSWin32, the following directories would be used for "vim":
Config for app vim is C:/Users/somia/AppData/Local/.config/vim
Data for app vim is C:/Users/somia/AppData/Local/.local/share/vim
Cache for app vim is C:/Users/somia/AppData/Local/.cache/vim
According to File::XDG on Linux, the following directories would be used for "vim":
Config for app vim is /home/somian/.config/vim
Data for app vim is /home/somian/.local/share/vim
Cache for app vim is /home/somian/.cache/vim
Pretty unpredictable, huh! Note in particular the distinction between the Cygwin results and the MSWin results. I could use CygwinPerl for this, or Strawberry. In the end, maybe it's basically an esthetic choice. Which looks better to you? I'm leaning towards the choice made by Strawberry (Win32) perl.
The Code
EDIT The original code I posted - unfinished, placed in READMORE tags:
The (probably) final code
#!/usr/bin/env perl
# Last modified: Tue May 27 2025 03:02:52 PM -04:00 [EDT]
use strict;
use v5.18;
use utf8;
use warnings;
=head1 SYNOPSIS
perl Emplace-XDG-dirs
=cut
use File::XDG 1.00;
use File::Spec;
use File::Path qw(mkpath rmtree);
use subs qw/tellMe/;
my ($XDGUser, $XDGData, $XDGCache);
my $appName = 'vim';
my $xdgEmp = File::XDG->new( name => $appName , api => 1 );
$XDGUser = $xdgEmp->config_home;
$XDGData = $xdgEmp->data_home;
$XDGCache = $xdgEmp->cache_home;
my @branches = (File::Spec->catdir($XDGData => 'backups'),
File::Spec->catdir($XDGData => 'swapfiles'));
say "We could make these dirs for you:";
say join qq[\n]=>@branches, '';
if (tellMe("making those dirs for $appName")) {
mkpath (@branches, {verbose => 'true', mode => 0775});
} else {
say "No? OK, aborting now";
}
=head2 Vim settings
Put in our .vimrc config file:
set backup
set backupcopy=auto
set backupdir= ... (dir created by script)
set directory= ... (dir created by script)
=cut
sub tellMe {
my $gummy = $_[0];
my $ans = "Y";
printf "Do you want proceed with: %s? [Y/n]\n", $gummy;
chomp ($ans = <STDIN>);
if ($ans =~/Y|y/) {
return 1;
} elsif ($ans eq '') {
return 1;
} else {
return 0;
}
}
# vim: ft=perl et sw=4 ts=4 :
Thanks for your interest!
May 27, 2025 at 19:04 UTC
A just machine to make big decisions
Programmed by fellows (and gals) with compassion and vision
We'll be clean when their work is done
We'll be eternally free yes, and eternally young
Donald Fagen —> I.G.Y.
(Slightly modified for inclusiveness)
|
|