Re: nesting loops help?
by GrandFather (Saint) on Mar 09, 2022 at 20:43 UTC
|
You really need to use strict (use strict;). Consider what happens when the following line invokes the die.
open my $wave, '>', 'Wave' or die "Can't open $wave: $!";
Using three parameter open and error handling is laudatory, but omitting strict and a few other areas of your code could use some TLC. Consider the following light do over:
#!usr/bin/perl
use strict;
use warnings;
open my $wave, '>', 'Wave' or die "Can't open 'Wave': $!
+";
open my $keywords, '<', 'Agents' or die "Can't open 'Agents':
+$!";
open my $search_file, '<', 'Definitions' or die "Can't open 'Definitio
+ns': $!";
my $keyword_or = join '|', map {chomp; qr/\Q$_\E/} <$keywords>;
my $regex = qr|\b($keyword_or)\b|;
while (defined (my $line = <$search_file>)) {
while ($line =~ /$regex/g) {
next if $line =~ /(SCRIPTNAME|DESCRIPTION)/;
print $wave $line;
last;
}
}
Things to note:
- use of strict
- error handling mentions the file name rather than the file function
- Avoid using $_ where a manifest variable can be used instead (c.f. $line)
- use a statement modifier if to skip in the inner loop to make logic flow clearer
- terminate inner loop once the line has been printed to avoid multiple copies
- remove the explicit file close statement. With lexical file handles the files are closed when the handle variable goes out of scope so it unusual to need to explicitly close files
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] [d/l] [select] |
|
|
Thanks! I really appreciate the break down of improvements, trying to do better I just don't always understand the suggestions and as a bad habit sometimes I turn off strict until I get something working then go back and polish it, a lot room for improvement still.
If you have a moment I've run into a desire to join the script you helped with to another that's similar, I'm not following the logic of how to iterate over multiple sources very well. I'll repost what I'm using now for example. In the first script that you helped with it takes a value from $search file to see if its in the $schedule file that's read into a hash at the beginning, which is great thanks to your your help.
I have another function which was done separately because I generally find doing tasks in smaller chunks easier to understand and get right, but I'd like to bring them together, this one reads $schedule line by line, when it matches a regex pattern it pushes that value into an array, when it reaches the value from the $search_file, I have it print the last value stored in the array, because its not the $search_file value I need next, its actually a value several lines above it that I captured with regex so this is kind of a look behind procedure?
Basically my confusion hit turbulence trying to do both, does it make sense to then change the first script to iterate line by line instead before trying to pair them, or is there a way to still do the regex capture if I'm instead using the existing hash already saved in the first script?
use warnings;
use strict;
open my $schedule, '<', 'Schedule';
my %schedule;
$schedule{$_} = 1 while (<$schedule>);
close $schedule;
open my $wave, '>', 'Wave' or die "Can't open 'Wave': $!
+";
open my $keywords, '<', 'Agents' or die "Can't open 'Agents':
+$!";
open my $search_file, '<', 'Definitions' or die "Can't open 'Definitio
+ns': $!";
my $keyword_or = join '|', map {chomp; qr/\Q$_\E/} <$keywords>;
my $regex = qr|\b($keyword_or)\b|;
open my $schedule, '<', 'Schedule' or die "Can't open 'Schedule': $!";
while (defined (my $line = <$search_file>)) {
while ($line =~ /$regex/g) {
next if $line =~ /(SCRIPTNAME|DESCRIPTION)/;
my $lineout = $line;
chomp $lineout;
print $wave $lineout;
print $wave exists $schedule{$line} ? " | Yes \n" : " | No
+\n";
print $wave exists $schedule{$line} ? " | Yes \n" : " | No
+\n";
# at this point my output looks something like this: VALUE_FRO
+M_SEARCH_FILE | Yes
# want to use the value in $line as the input variable to the
+other script where marked as $value_from_line_in_search_file
# want to print the result of other script back here
# at this point I would like the output like this: VALUE_FROM_
+SEARCH_FILE | Yes | VALUE_FROM_SCHEDULE_FILE
last;
}
}
use warnings;
use strict;
my $value_from_line_in_search_file = 'SAMPLE#DATA';
open my $schedule, '<', 'Schedule' or die "Can't open 'Schedule': $!";
my @values=();
while (<$schedule>) {
if (/(^SCHEDULE)(.*)/) { # SCHEDULE SCHEDULENAME, I only want the na
+me.
push(@values, $2);
}
if (/$value_from_line_in_search_file/) {
print $values[-1];
last;
}
}
| [reply] [d/l] [select] |
|
|
I always start with strictures on and never turn them off. At the very least my editor then warns me about typos and variables that are either out of scope or haven't been declared yet. Having that heads up before you run the code generally saves a few debugging iterations straight off the bat!
At this point some sample data would help a lot. You can include it in a short self contained example script like this:
use warnings;
use strict;
my $scheduleData = <<SDATA;
Person 1: Joe
SDATA
my $agentData = <<ADATA;
Joe
Moe
Flo
ADATA
my $defsData = <<DDATA;
SCRIPTNAME It's a wonderful life
Person 1: Joe
Person 2: Moe
DDATA
open my $schdIn, '<', \$scheduleData or die "Can't open 'Schedule': $!
+";
my %scheduleMatch = map {chomp; $_ => 1} <$schdIn>;
open my $agentsIn, '<', \$agentData or die "Can't open 'Agents': $!";
open my $defsIn, '<', \$defsData or die "Can't open 'Definitions':
+$!";
my $agentsList = join '|', map {chomp; qr/\Q$_\E/} <$agentsIn>;
my $agentsMatch = qr|\b($agentsList)\b|;
while (defined(my $defLine = <$defsIn>)) {
chomp $defLine;
if ($defLine =~ $agentsMatch && $defLine !~ /(SCRIPTNAME|DESCRIPTI
+ON)/) {
my $lineout = $defLine;
my $flag = $scheduleMatch{$defLine} ? 'Yes' : 'No';
print "$defLine | $flag\n";
}
}
which prints:
Person 1: Joe | Yes
Person 2: Moe | No
Then tell us where the test script doesn't work as expected or, in this case, how the output should look for the additional functionality you want.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] [d/l] [select] |
|
|
|
|
|
Re: nesting loops help?
by choroba (Cardinal) on Mar 09, 2022 at 17:28 UTC
|
If the new file isn't huge, read it into a hash at the beginning. Then, check each value against the hash while processing the lines in the existing loop.
Without samples of the input and expected file sizes we can't provide more specific help.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
|
Thanks, that seems simple enough, it works and only takes about 8 seconds so should be good. I don't know if my interpretation of the suggestion is optimal but it got me to a solution, I just used an array as my data didn't need a keyed hash. The files are fairly basic text, less than 40,000 lines.
#!usr/bin/perl
use warnings;
use List::MoreUtils qw(any);
open my $wave, '>', 'Wave' or die "Can't open $wave: $!";
open my $keywords, '<', 'Agents' or die "Can't open keywords: $!";
open my $search_file, '<', 'Definitions' or die "Can't open search f
+ile: $!";
open my $schedule, '<', 'Schedule' or die "Can't open search file: $
+!";
@sched = <$schedule>; # read entire file into an array at the start
my $keyword_or = join '|', map {chomp;qr/\Q$_\E/} <$keywords>;
my $regex = qr|\b($keyword_or)\b|;
while (<$search_file>)
{
while (/$regex/g)
{
$line = $_;
chomp $line;
if ( $line =~ /(SCRIPTNAME|DESCRIPTION)/ ) {
next;
}
print $wave $line;
if (any { $_ =~ $line } values @sched) { # check if line we're
+ on is also in the @sched array
print $wave " | Yes!\n";
}
else{
print $wave " | No!\n";
}
}
}
$_->close for $wave, $keywords, $search_file, $schedule;
| [reply] [d/l] |
|
|
Would you show partial contents of the files? Maybe first 10 lines or whatever you think will give an accurate subset of the data. Also you should "use strict;". That would among other things spot this problem "Can't open $wave: $!".
| [reply] |
Re: nesting loops help?
by BillKSmith (Monsignor) on Mar 09, 2022 at 18:50 UTC
|
No nested loops are required. Create a hash whose keys are the lines of 'schedule' and whose values indicate whether or not the corresponding line exists in $wave. (untested:)
use strict;
use warnings;
use autodie;
open my $schedule, '<', 'schedule_file.txt';
my %schedule;
$schedule{$_} = 1 while (<$schedule>);
close $schedule;
open my $search, '<', 'search_file.txt';
my %search;
$search{$_} = exists $schedule{$_} while (<$search>);
close $search;
my @yes = ('','yes');
my $regex = qr/tbsl/;
open my $keywrds, '<', 'keywrds.txt';
open my $wave, '>', 'wavefile.txt';
while (<$keywrds>) {
if (exists($search{$_}) and (/$regex/) ) {
print $wave $_, ' ', $yes[$search{$_}], "\n";
}
}
close $wave;
close $keywrds;
| [reply] [d/l] |
Re: nesting loops help?
by ikegami (Patriarch) on Mar 09, 2022 at 18:11 UTC
|
Does the inner loop even make sense? You'll just end up printing the same line over and over again. Don't you want
while (<$search_file>)
{
print $wave $_ if /$regex/ && !/SCRIPTNAME|DESCRIPTION/;
}
As for the new requirement, Is there any way you can build a hash or regex pattern from it? | [reply] [d/l] |
Re: nesting loops help?
by talexb (Chancellor) on Mar 10, 2022 at 14:00 UTC
|
use autodie;
in a bunch of my scripts -- that takes care of catching any open failures. Keep in mind that failure to open a file is a fatal error -- if that doesn't work, it's not worth continuing.
Alex / talexb / Toronto
Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.
| [reply] [d/l] [select] |
Re: nesting loops help?
by cavac (Prior) on Mar 11, 2022 at 15:49 UTC
|
I was going to look into this, but i absolutely hate, hate implicit the use of $_. It makes the code so much harder to read and so much harder to understand and so much harder to follow. I prefer it more like this:
#!usr/bin/perl
use strict;
use warnings;
open my $wave, '>', 'Wave' or die "Can't open $wave: $!";
open my $keywords, '<', 'Agents' or die "Can't open keywords: $!";
open my $search_file, '<', 'Definitions' or die "Can't open search f
+ile: $!";
# Urgh, that following line needs serious reformatting,
# It's still an ugly mess that is very hard to understand at a glance
my $keyword_or = join '|', map {chomp;qr/\Q$_\E/} <$keywords>;
my $regex = qr|\b($keyword_or)\b|;
while((my $line = <$search_file>)) {
if($line =~ /$regex/g) {
if ( $line =~ /(SCRIPTNAME|DESCRIPTION)/ ) {
next;
}
print $wave $line;
}
}
close $wave;
close $keywords;
close $search_file;
Still less the ideal, but at least i could start to follow the data around.
perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
| [reply] [d/l] [select] |
Re: nesting loops help?
by perlfan (Parson) on Mar 10, 2022 at 15:24 UTC
|
In addition to the suggestion to use strict;, I recommend you not rely on implicit variables.
#!/usr/bin/perl
#^^^ added / after !
use warnings;
open my $wave, '>', 'Wave' or die "Can't open $wave: $!";
open my $keywords, '<', 'Agents' or die "Can't open keywords: $!";
open my $search_file, '<', 'Definitions' or die "Can't open search f
+ile: $!";
my $keyword_or = join '|', map {chomp;qr/\Q$_\E/} <$keywords>;
close $keywords;
my $regex = qr|\b($keyword_or)\b|;
OUTER:
while (my $line = <$search_file>) {
# ^^^^^^^ using just $_ is for chodes
INNER:
while ($line =~ m/$regex/g) #<~ why is this a "while" an not an "i
+f"?
{
# skip if...
if ( $line =~ m/(?:SCRIPTNAME|DESCRIPTION)/ ) {
#^ if you're not capturing $1, don't keep it i
+n memory
next INNER;
# ^^^^^ labels when using "nested" loops can be very help
+ful
}
print $wave $line;
}
}
close $wave;
close $search_file;
<code>
Final note, you're opening yourself up to some adversarial input attac
+ks by trusting <code>$keyword_or
and blindly iterating over $search_files. Idk I think I also found an issue with your next and question your use of that inner while altogether. | [reply] [d/l] [select] |
|
|
In addition to the suggestion to use strict;
you didnt do that and your code doesnt work with strict
why is this a "while" an not an "if"?
read the fine manual
using just $_ is for chodes
big man
| [reply] |