in reply to Making this script process 56,000 lines 5 times faster

Hello kris004,

Performance varies between Perl releases. It appears from testing regular expression may run slower in Perl v5.20 and higher. Therefore, provided a subsequent demonstration to factor out the regular expression engine.

Update 1: Runs fast using any Perl release by merging the two regular expressions into one. Suggestion by LanX. Of course ;-), minimize the use of multiple regular expression statements when possible.

Update 2: Added -w switch to Perl.

#!/bin/sh curl https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts +|\ perl -wnl -e ' if ( /^0\.0\.0\.0 (.*)$/ ) { print "local-zone: \"" . $1 . "\" redirect\nlocal-data: \"" . $1 . + " A 0.0.0.0\""; } ' > ads.conf

Perl up to 5.18.x

#!/bin/sh curl https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts +|\ perl -wnl -e ' if ( /^0\.0\.0\.0/ ) { chomp; s/^0\.0\.0\.0 //; print "local-zone: \"" . $_ . "\" redirect\nlocal-data: \"" . $_ . + " A 0.0.0.0\""; } ' > ads.conf

Perl 5.20.x and higher

#!/bin/sh curl https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts +|\ perl -wnal -e ' if ( $F[0] eq "0.0.0.0" ) { print "local-zone: \"" . $F[1] . "\" redirect\nlocal-data: \"" . $ +F[1] . " A 0.0.0.0\""; } ' > ads.conf

Perl switches

$ perl --help Usage: perl [switches] [--] [programfile] [arguments] ... -a autosplit mode with -n or -p (splits $_ into @F) -e program one line of program (several -e's allowed, omit pr +ogramfile) -l[octal] enable line ending processing, specifies line term +inator -n assume "while (<>) { ... }" loop around program -w enable many useful warnings ...

Regards, Mario

Replies are listed 'Best First'.
Re^2: Making this script process 56,000 lines 5 times faster
by marioroy (Prior) on Mar 22, 2018 at 07:02 UTC

    Hi kris004,

    Another option is having Perl read the Curl output directly versus using LWP::Simple. The "-|" mode for open means the string that follows is interpreted as a command that pipes the output to us. See open function.

    #!/usr/bin/perl use strict; use warnings; my $input = "https://raw.githubusercontent.com/StevenBlack/hosts/mast +er/hosts"; my $output = "ads.conf"; open my $IN, "-|", "curl $input" or die "open error $input: $!"; open my $OUT, ">", $output or die "open error $output: $!"; while ( <$IN> ) { if ( /^0\.0\.0\.0 (.*)$/ ) { print {$OUT} "local-zone: \"" . $1 . "\" redirect\nlocal-data: \"" + . $1 . " A 0.0.0.0\"\n"; } } close $IN; close $OUT;

    Regards, Mario