If
you have a question on how to do something in Perl, or
you need a Perl solution to an actual real-life problem, or
you're unsure why something you've tried just isn't working...
then this section is the place to ask.
However, you might consider asking in the chatterbox first (if you're a
registered user). The response time tends to be quicker, and if it turns
out that the problem/solutions are too much for the cb to handle, the
kind monks will be sure to direct you here.
Dear Monks, I'm passing an octal file mode to a sub like sub(0666) but
in the sub I'm seeing it as 438, and 0777 becomes 511. Quoting the mode
causes it to lose the leading zero so that sub(q(0666)) becomes 666. I
can put the leading 0 back and continue but that feels messy. Why is
0666 changing to 438? Thank you!
I keep running into an uninitialized error that doesnt make sense to me, I should be writing the state in column 6, but I keep getting Use of uninitialized value $t6 in string eq at Project1_6.pl line 48, <FH> line 2551.
Use of uninitialized value $t6 in string eq at Project1_6.pl line 42, <FH> line 2552.
Used perl c- it runs, says syntax is okay
I tried declaring $t in line 35, "$worksheet->write($rowCount1+1, $_, my $t$_);"
Perl C- said that was okay, but I get "syntax error at Project1_6.pl line 37, near "$t[""
use Excel::Writer::XLSX;
use strict;
use warnings;
my $rowCount = 0;
my $filename = "Output2022.xlsx";
my $workbook = Excel::Writer::XLSX->new( $filename );
open(FH, "<", "SRC185.xlsx" ) or die;
my $worksheet = $workbook->add_worksheet('List');
$worksheet->write(0, 0, "source_id" );
$worksheet->write(0, 1, "first_name" );
$worksheet->write(0, 2, "middle" );
$worksheet->write(0, 3, "last_name" );
$worksheet->write(0, 4, "address1" );
$worksheet->write(0, 5, "city");
$worksheet->write(0, 6, "state");
$worksheet->write(0, 7, "postal_code");
$worksheet->write(0, 8, "phone_number");
$worksheet->write(0, 9, "address3");
$worksheet->write(0, 10,"province");
$worksheet->write(0, 11, "email");
my $rowCount1 = 0;
my $t=0;
my @z = 90005;
while (<FH>){
chomp;
my @t=(',',$_);
if(defined($t[8])){
my $Count=0;
$worksheet->write($rowCount1+1, $_, $t[$_]);
$Count++;
} elsif($t[6] eq "CA" && $t[7] eq !defined) {
$worksheet->write($rowCount1+1, 7, $z[7]);
}
$rowCount1++;
}
$workbook->close();
close(FH);
Can a lady or gentleman please help me find the correct resource. I am looking for information that I can use in an app I'm building.
It will be run on a LAN behind a firewall. It would be far more convenient if the user could be logged in automatically if they are already logged in to the network it is running on. So, I'm seeking information on accessing a network users's userid and login status. I'm not looking for specific code, just guidance to the proper keywords or terms that I need to research to achieve this end goal, if it is even possible.
I don't mind doing the legwork, I just don't know what the specific legwork is that I must do right now.
I've created a helper function for my own purposes and thought it would be useful to others. So CPAN seems a sensible place to put it so others can use it if they want to...
It's function is simple - to go to the homepage of a website and return an array of URI's within that site, being careful not to stray outside it, that use the http or https scheme. It ignores things that aren't plain text or that it cannot parse such as PDFs or CSS files but includes Javascript files as links (thing like window.open or document.location.href) might be lurking there. It deliberately doesn't try to follow the action attribute of a form as that is probably meaningless without the form data.
As the Monastery has taught be that all published modules should have tests, I want to do it probably and provide those tests...
But, given that there is only one function and it makes HTTP requests, what should I test?
The obvious (to me) test is that it returns the right number of URIs from a website. But that number will likely change over time, so I cannot hardcode the 'right' answer into the tests. So beyond the necessary dependencies and their versions, I'd like some ideas of what should be in the tests, please.
In case you're interested, this came about from wanting to automate producing and installing sitemap files.
However, I get the following error message when I try to run it via SSH from my local machine:
leudwinus@localmachine:~$ ssh user@remotemachine './json_test'
Can't locate JSON.pm in @INC (you may need to install the JSON module)
+ (@INC contains: /home/user/perl5/lib /usr/local/lib/perl5/site_perl/
+mach/5.32
/usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.32/mach /usr/loc
+al/lib/perl5/5.32) at ./json_test line 4.
BEGIN failed--compilation aborted at ./json_test line 4.
I have a large amount of legacy CGI Perl scripts, which were made to be executed by Apache modperl.
What I want to do is move to a http server inside a perl script (like AnyEvent::HTTPD) that supports calling the perl CGI scripts - and then package the entire thing as a single exe using par. This is so users can run it without having to have either apache or perl installed.
Now CGI seems fairly straight forward (content to STDIN, headers to ENV, and STDOUT is the HTTP response), but I figured someone has probably done this before? Or are there better ideas?
I am essentially doing this to execute the cgi in the httpd request callback:
{
local *STDOUT;
local *STDIN;
#local %ENV;
#$ENV{http-cookie} = $requestCookie;
open(STDIN, "<", \$requestBody);
open (STDOUT, '>>', \$response);
do './cgi/foobar.pl';
}
The error references the List::Util module, but doesn't occur unless I add the "use File::Copy", which is a completely separate module. What is going on here?
Hello, monks.
With your help I've written a script that processes a large number of text files, efficiently. I run this script inside directories containing 1K to 10K files, usually less than 5K.
However, I've noticed that attempting to process larger number of files, i.e. several directories at once, the script gets exponentially slower. For example, while a run on 3.5K files would takes around 4.5 seconds, on 35K files takes 90 instead of 45 seconds and on 350K files it runs for hours.
This has baffled me, as I'm using subdirectories to organize the data, and filesystem operations shouldn't impact performance negatively; additionally, the data filenames are glob()bed into an array which is looped over and not slurped in at once and processed in bulk (although, in my tests I tried that approach which exhibited the same behavior).
What's very interesting is that when I put a counter to stop processing at 1000 files, I got increasingly longer processing times with each subdirectory added to the list, despite only processing 1000 files from it. Also, I always copy my data to /tmp which is mounted as tmpfs to reduce SSD wear and achieve maximum read/write performance.
Testing:
wget http://www.astro.sunysb.edu/fwalter/AST389/TEXTS/Nightfall.htm
html2text-cpp Nightfall.htm >nightfall.txt
mkdir 00; for i in `seq -w 0 3456`; do head -$((RANDOM/128)) nightfall
+.txt >00/data-$i; done
This will create a directory ("00") with 3,456 random sized files inside. Perl script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.36.0;
use Env;
use utf8;
use POSIX "sys_wait_h"; #for waitpid FLAGS
use Time::HiRes qw(gettimeofday tv_interval);
use open ':std', ':encoding(UTF-8)';
my $benchmark = 1; # print timings for loops
my $TMP='/tmp';
my $HOME = $ENV{HOME};
my $IN;
my $OUT;
my @data = glob("data-* ??/data-*");
my $filecount = scalar(@data);
die if $filecount < 0;
say "Parsing $filecount files";
my $wordfile="data.dat";
truncate $wordfile, 0;
#$|=1;
# substitute whole words
my %whole = qw{
going go
getting get
goes go
knew know
trying try
tried try
told tell
coming come
saying say
men man
women woman
took take
lying lie
dying die
};
# substitute on prefix
my %prefix = qw{
need need
talk talk
tak take
used use
using use
};
# substitute on substring
my %substring = qw{
mean mean
work work
read read
allow allow
gave give
bought buy
want want
hear hear
came come
destr destroy
paid pay
selve self
cities city
fight fight
creat create
makin make
includ include
};
my $re1 = qr{\b(@{[ join '|', reverse sort keys %whole ]})\b}i;
my $re2 = qr{\b(@{[ join '|', reverse sort keys %prefix ]})\w*}i;
my $re3 = qr{\b\w*?(@{[ join '|', reverse sort keys %substring ]})\w*}
+i;
truncate $wordfile, 0;
my $maxforks = 64;
print "maxforks: $maxforks\n";
my $forkcount = 0;
my $infile;
my $subdir = 0;
my $subdircount = 255;
my $tempdir = "temp";
mkdir "$tempdir";
mkdir "$tempdir/$subdir" while ($subdir++ <= $subdircount);
$subdir = 0;
my $i = 0;
my $t0 = [gettimeofday];
my $elapsed;
foreach $infile(@data) {
$forkcount -= waitpid(-1, WNOHANG) > 0 while $forkcount >= $maxfor
+ks;
# do { $elapsed=tv_interval($t0); print "elapsed: $elapsed\n"; die;
+ } if $i++ >1000; # 1000 files test
$i++; # comment out if you uncomment the above line
$subdir = 1 if $subdir++ > $subdircount;
if (my $pid = fork) { # $pid defined and !=0 -->parent
++$forkcount;
} else { # $pid==0 -->child
open my $IN, '<', $infile or exit(0);
open my $OUT, '>', "$tempdir/$subdir/text-$i" or exit(0);
while (<$IN>) {
tr/-!"#%&()*',.\/:;?@\[\\\]”_“{’}><^)(|/ /; # no punct "
s/^/ /;
s/\n/ \n/;
s/[[:digit:]]{1,12}//g;
s/w(as|ere)/be/gi;
s{$re2}{ $prefix{lc $1} }g; # prefix
s{$re3}{ $substring{lc $1} }g; # part
s{$re1}{ $whole{lc $1} }g; # whole
print $OUT "$_";
}
close $OUT;
close $IN;
defined $pid and exit(0); # $pid==0 -->child, must exit itself
}
}
### now wait for all children to finish, no matter who they are
1 while wait != -1; # avoid zombies this is a blocking operation
local @ARGV = glob("$tempdir/*/*");
my @text = <>;
unlink glob "$tempdir/*/*";
open $OUT, '>', $wordfile or die "Error opening $wordfile";
print $OUT @text;
close $OUT;
$elapsed = tv_interval($t0);
print "regex: $elapsed\n" if $benchmark;
Snippets of code should be wrapped in
<code> tags not<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).