Re: Please critique this script -- Read emails and write URLs into bookmarks
by talexb (Chancellor) on Oct 06, 2016 at 18:39 UTC
|
Hmm .. well, I don't see any error-checking, you're using a regex for something that isn't clear (re-formatting the content of the subject line?), and you're assuming there is only going to be a single URI in each message.
Alex / talexb / Toronto
Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.
| [reply] |
|
|
There is only one valid URL per message. This is a situational constraint (also written in the spec). I know that because these are 'Emails-to-self' that contain at max two URLs (one in the plain text part, one in the HTML part if it exists)
Maybe it's a bit difficult to read but the regex is transforming the subject into a suitable filename, escaping problematic characters.
I tried to be sensible about the error checking and assumed the modules do error handling for their part, outside of that there's hardly any code in its own right.
Now that I think of it though, input should have been checked. That was a mistake.
What else do you think needs explicit error handling?
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
Re: Please critique this script -- Read emails and write URLs into bookmarks
by kcott (Archbishop) on Oct 07, 2016 at 21:02 UTC
|
"This is my first Perl script."
++
This is an excellent, first effort!
"I'm looking for any kind of feedback, especially pitfalls and 'better' idioms etc."
Here's some observations; along with various suggestions and recommendations:
-
I'd personally avoid hard-coding a path to perl in the shebang line.
Consider using the current environment's perl:
#!/usr/bin/env perl
This allows you to change the version of Perl in the current environment
and run your script under this new version.
In particular, it's useful when testing how your script runs under different versions of Perl
(perlbrew is a common choice for this).
-
You're using the '/r' modifier in your regex substitution.
The way you're using it is fine.
The problem is, it was introduced in v5.14[perl5140delta: Non-destructive substitution]:
anyone using an earlier version will have problems.
The earliest version I have installed is v5.14.2, so I can't test this.
Whenever I use the '/r' modifier, I typically start my scripts like this:
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
...
It's fine to use advanced features;
however, if you do, it's a good idea to keep an eye on their statuses (e.g. stable, experimental, and so on).
Take a look at the perl5delta and perl5vvsdelta[vv=version; s=subversion] manpages
in the perldoc: perl: Miscellaneous section.
New features are typically added in the first version (i.e. subversion=0):
I typically start searches in the perl5vv0delta pages.
Here's an example of the types of changes that can occur over just a few versions:
v5.24 - works as expected; v5.22 and v5.20 - can be made to work with two features, warnings silenced and experimental status; v5.18 - fatal error.
If '$x->@*' is uncharted territory, take a look at
"perlref: Postfix Dereference Syntax".
$ alias perle
alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E'
$ perlbrew switch perl-5.24.0t
$ perl -v | head -2 | tail -1
This is perl 5, version 24, subversion 0 (v5.24.0) built for darwin-th
+read-multi-2level
$ perle 'my $x = [qw{a b c}]; say "@{$x}"; say "$x->@*"'
a b c
a b c
$ perlbrew switch perl-5.22.0t
$ perl -v | head -2 | tail -1
This is perl 5, version 22, subversion 0 (v5.22.0) built for darwin-th
+read-multi-2level
$ perle 'my $x = [qw{a b c}]; say "@{$x}"; say "$x->@*"'
a b c
ARRAY(0x7ff4d2805480)->@*
$ perle 'no warnings "experimental::postderef"; use feature qw{postder
+ef postderef_qq}; my $x = [qw{a b c}]; say "@{$x}"; say "$x->@*"'
a b c
a b c
$ perlbrew switch perl-5.20.0t
$ perl -v | head -2 | tail -1
This is perl 5, version 20, subversion 0 (v5.20.0) built for darwin-th
+read-multi-2level
$ perle 'no warnings "experimental::postderef"; use feature qw{postder
+ef postderef_qq}; my $x = [qw{a b c}]; say "@{$x}"; say "$x->@*"'
a b c
a b c
$ perlbrew switch perl-5.18.0t
$ perl -v | head -2 | tail -1
This is perl 5, version 18, subversion 0 (v5.18.0) built for darwin-th
+read-multi-2level
$ perle 'no warnings "experimental::postderef"; use feature qw{postder
+ef postderef_qq}; my $x = [qw{a b c}]; say "@{$x}"; say "$x->@*"'
Unknown warnings category 'experimental::postderef' at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
-
I don't see any reason for $VERSION;
however, if you do need it, first see
"version: SYNOPSIS".
Of particular note:
...
use version; our $VERSION = qv("v1.2.3"); # deprecated
...
our $VERSION = "1.0203"; # recommended
...
-
Consider compiling your regex only once, and doing so at compile time.
By the way, much of your escaping was unnecessary;
I've removed the superfluous backslashes (i.e. all except '\\');
see "perlrecharclass: Special Characters Inside a Bracketed Character Class".
I might have written something along the lines of:
...
{
my $re; BEGIN { $re = qr{(?x: [%/?<>\\:*|":] )} }
for ... {
... s/$re/_/gr;
}
}
-
Take a look at
"perlperf - Perl Performance and Optimization Techniques";
in particular, BENCHMARKS's
"Search and replace or tr" section.
While both substitution and transliteration can be used for this task,
Benchmarking indicates transliteration can do it 25 times faster.
$ perle 'my $re; BEGIN { $re = qr{(?x: [%/?<>\\:*|":] )} } say q{%/?<>
+\\:*|":} =~ s/$re/_/gr'
___________
$ perle 'say q{%/?<>\\:*|":} =~ y{%/?<>\\:*|":}{_}r'
___________
$ for i in {1..5}; do perle 'use Benchmark qw{cmpthese}; my $re; BEGIN
+ { $re = qr{(?x: [%/?<>\\:*\|":] )} }; my $str = q{%/?<>\\:*|":}; cmp
+these 0 => { s => sub { $str =~ s/$re/_/gr }, y => sub { $str =~ y{%/
+?<>\\:*|":}{_}r } }'; done
Rate s y
s 212144/s -- -96%
y 5420361/s 2455% --
Rate s y
s 204213/s -- -96%
y 5632150/s 2658% --
Rate s y
s 201600/s -- -97%
y 5771041/s 2763% --
Rate s y
s 204103/s -- -96%
y 5026909/s 2363% --
Rate s y
s 222651/s -- -96%
y 5600560/s 2415% --
Perhaps a regex substitution was not the best choice.
-
I don't see any need for the intermediary variable '$uri':
you can just write 'push @uris, shift'.
I'd also question the need to create a new coderef on every iteration.
Consider something like one of these two ways (untested):
{
my @uris;
my $finder_code = sub { push @uris, shift };
sub finder_sub { push @uris, shift }
for ... {
...
@uris = ();
my $code_finder = URI::Find::->new($finder_code);
my $sub_finder = URI::Find::->new(\&finder_sub);
...
}
}
If you were wondering about the extra pair of colons I added
('::->'), see
"perlobj: Invoking Class Methods".
There may be other issues, but these were the ones that stood out for me.
Having pointed out all those issues,
I just wanted to reiterate that this is a well-written script
and an excellent first attempt.
See also:
the online perl manpage (which I have bookmarked and refer to often);
"Create A New User".
Update: Fixed typo: s/obversations/observations/
| [reply] [d/l] [select] |
|
|
./foobar.pl # <- perl from shebang
perl foobar.pl # <- whatever perl is found first in $ENV{'PATH'}
/home/me/bin/perl-special foobar.pl # <- a special perl
/usr/local/bin/perl foobar.pl # <- common location for a local perl
/usr/bin/perl foobar.pl # <- common location for system perl
perlrun has an entire chapter "Location of Perl" with recommendations:
- When possible, it's good for both /usr/bin/perl and /usr/local/bin/perl to be symlinks to the actual binary. If that can't be done, system administrators are strongly encouraged to put (symlinks to) perl and its accompanying utilities into a directory typically found along a user's PATH, or in some other obvious and convenient place.
- You are advised to use a specific path if you care about a specific version.
Perl has some surprises when it comes to the shebang line, see perlrun:
- Parsing of the #! switches starts wherever "perl" is mentioned in the line. The sequences "-*" and "-" are specifically ignored
- If the #! line does not contain the word "perl" nor the word "indir", the program named after the #! is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don't do #!
Also, while most systems have /usr/bin/env, not all systems have it. Some have the env utility in /bin/env. Some systems use a symlink in one directory pointing to env in the other directory. See also http://www.in-ulm.de/~mascheck/various/shebang/#env.
env is another process that has to be started. env does not just start perl; at least, it has to parse the command line for arguments. See busybox env.c for a quite minimal implementation. Entry point is env_main(). The GNU version does much more, see GNU env.c. So using env in the #! line increases the startup time. Generally, it does not matter that much, especially on modern machines. But the overhead is not zero.
I typically start my scripts like this:
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use strict is redundant here, as use VERSION with a VERSION >= 5.12.0 implies use strict. See use.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
|
++
Thanks for the feedback.
I'm aware of the shebang line points you raise.
Hard-coding paths to perl has, in my (>20 years) experience, many more problems than using env.
Of course, there are occasions where a hard-coded path might be more appropriate.
You quoted me as saying:
"I typically start my scripts like this: ..."
A more accurate quote, that conveyed the point I was making, and didn't hide the text you elided, would have been:
"Whenever I use the '/r' modifier, I typically start my scripts like this: ..."
If I removed the '/r' modifier, but still had code with say, //=, etc.,
I only need to change a '4' to a '0' (i.e. 5.014 → 5.010);
I do not need to remember to add use strict.
| [reply] [d/l] [select] |
|
|
This was a long one to parse, thank you for that :)
I've done almost all you've recommended, especially the trivial ones.
Nice performance considerations as well.
| [reply] |
Re: Please critique this script -- Read emails and write URLs into bookmarks
by eyepopslikeamosquito (Archbishop) on Oct 07, 2016 at 23:23 UTC
|
Modern Perl assumes you're already decent at programming,
so it elides some basic stuff in favor of explaining how
Perl works from philosophy to programming in the large.
Learning Perl assumes you've never programmed before,
so it spends more time on the basics, covers less of the language,
and doesn't explore the philosophy of Perl in as much detail.
-- chromatic (author of Modern Perl) comments on the philosophy behind his book
Given this is your first Perl script, you have demonstrated
that you are an accomplished programmer, who is new to Perl.
Accordingly, I feel Modern Perl
is more appropriate for you than
Learning Perl.
Modern Perl is free and very quick to read.
If you have more time, and want to get inside the head of
the language designer,
Programming Perl
is also good.
As for good "second" books on Perl, I suggest:
In particular, Perl Best Practices,
Chapter 4, "Values and Expressions" in the "Don't mix high- and low-precedence booleans" item,
discusses a stylistic point already raised in this thread.
For flow of control, prefer the low precedence
and and or operators.
A common example of preferring or to || is:
open(my $fh, '<', $file) or die "error opening '$file': $!";
Reserve && and || for logical expressions, not flow of control, for example:
if ($x > 5 && $y < 10) ...
| [reply] [d/l] [select] |
|
|
Thank you for the book recommendations.
"For flow of control, prefer the low precedence and and or operators. A common example of preferring or to || is:"
"Reserve && and || for logical expressions, not flow of control, for example:"
I'm inclined not to follow this. I don't like to rely on subtle precedence rules.
Is there any reasoning to not always use parentheses instead? (Not that I did that in the versions you know.)
| [reply] |
|
|
I'm inclined not to follow this. I don't like to rely on subtle precedence rules. Is there any reasoning to not always use parentheses instead?
(This is a quite generic question, applying not only to Perl, and so the answer does not only apply to Perl.)
There might be reasons, I guess the most common ones are avoiding typing and cleaner code.
And I think both do not apply, for exactly the reason you gave: "I don't like to rely on subtle precedence rules."
Treating operators as having completely random precedence, except for parentheses and the "multiplication and division before addition and subtraction" rule definitively prevents nasty surprises with operator precedence. Adding parentheses does not hurt, the compiler or interpreter should need just a few CPU cycles to handle them. Parentheses clearly indicate the programmer's intended order of evaluation, without forcing the next programmer to memorize precedence rules.
Example:
if (a && b || c && d || e) // wtf? Don't make me guess what you intend
+ed!
if ((a && b) || (c && d) || e) // clear
if (a && (b || c) && (d || e)) // as clear, but a very different condi
+tion
if (a && (b || (c && (d || e)))) // a different condition
if ((((a && b) || c) && d) || e) // another different condition
My current job is developing embedded software for medical and aerospace devices. We have a very strict rule that parentheses are required whereever code might be ambiguous. The first example would never get through the obligatory code review; you will never find that in production code.
And by the way: If you are working with several languages, as I do, you will notice that precedence rules vary from language to language. Most copy or inherit from C, but some are very different. MUMPS evaluates strictly from left to right, unless parentheses are used. In MUMPS, WRITE 1+2*3 writes 9, not 7. WRITE 1+(2*3) writes 7.
Of course, redundant parentheses are rubbish:
if ((((((((a && b))) || (((c)))))))) // make extra-sure that parenthes
+es have precedence ;-)
Some times, lots of parentheses are required, adding some white space and indention clearly helps:
if (
((a == A_MAGIC_1) || (a == A_MAGIC_2))
&&
((b == B_MAGIC_1) || (b == B_MAGIC_2))
&&
(
(c == C_MAGIC_1)
||
((c == C_MAGIC_2) && (a == A_MAGIC_2) && (b == B_MAGIC_2))
)
)
Yes, this will make the C compiler "waste" about thirty precious CPU cycles eating up all of that redundant white space and braces. Just think how long that takes at 3 GHz! And it gets worse, because all of the source code "wastes" CPU cycles. So compiling the enitre project takes 1.5 msec more time than needed. Flashing into the target takes 20 +/- 2 seconds. Deciphering source code "optimized" for better compile times takes minutes per line, hours per file, every time a change is required. This is a real waste of time and money. Computers and compilers become faster every day. A modern desktop runs circles around old super computers even with crap components. "Optimizing" source code for compile time is ridiculous, except in very rare high-performance cases.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
|
Is there any reasoning to not always use parentheses instead?
Yes, clarity and readability.
While I certainly do not endorse memorising the precedence table,
I feel non-casual programmers should know the precedence
of very commonly used operators.
To illustrate,
if you follow the "always use parentheses" rule,
you'd be obliged to write:
( my $x ) = ( $y + 42 );
which I feel is borderline obscene.
This is surely more clearly written,
without the clutter of parentheses, simply as:
my $x = $y + 42;
BTW, I would similarly object to parentheses here in C
or in any language where assignment is an operator.
In Perl, I place the ultra low precedence and and or
operators in the same category as the = operator,
at least for non-novice Perl programmers.
Being very low precedence makes them especially well-suited to
flow of control, and I agree with Conway that reserving
them exclusively for that purpose, for example:
open(my $fh, '<', $file) or die "error opening '$file': $!";
or:
$meaning == 42 or next;
is idiomatic Perl and makes the code clearer and more
enjoyable to read.
| [reply] [d/l] [select] |