Hi and thank you everyone for all your great suggestions! :)

It sounds like perl is definitely quicker than what I'm doing now.

As a few have asked to see the current code, I'll include it here (be nice...)

It's part of my system for blocking IP addresses when rather dodgy requests come from them (looking for login and admin pages for example). It's been stripped right back to only the slow function and some supporting bits. I think I've managed to remove everything else around it that isn't directly related. It's not pretty but it's easy to read.

A short explanation (which I'm sure you lot won't really need) is as follows:

A list of IP addresses is compiled into a file (for this example it's called input.list). The list is sorted with a key on each octet and duplicate entries removed.

The script shown below then reads input.list line-by-line.

The output goes into output.list

Basic concept here is to condense the input list down by using a CIDR mask (0/24) where possible. This is done if there are 2 or more 4th octets where the first 3 octets are the same. So, instead of single IP addresses being blocked, the whole 256 will be blocked. If the first 3 do match then the 4th octet is replaced with the mask.

e.g. If we have these 7 lines in the input:

1.2.3.0/24 1.2.3.4 1.2.3.6 1.4.3.5 2.3.1.2 2.3.2.1 2.3.2.10
the output will be these 4 lines:
1.2.3.0/24 1.4.3.5 2.3.1.2 2.3.2.0/24

Just to make it more difficult, each line also has a tab char then a comment after the IP address (this comment contains the reason that my IDS chose to block this IP). The comment is enclosed within the C comment structure. This comment also has to be carried over to the output file.

So each input line actually looks like this:

1.161.169.75 /* wp-login.php */ 1.168.230.73 /* wp-login.php */ 1.174.218.109 /* wp-login.php */ 1.192.128.23 /* /manager/ */ 1.214.212.74 /* .cgi */ 1.234.20.151 /* ZmEu */ 1.249.203.135 /* .cgi */ 2.61.137.117 /* wp-login.php */ 2.77.94.236 /* wp-login.php */ 2.90.252.253 /* wp-login.php */ 2.139.237.110 /* /manager/ */ 2.176.166.94 /* wp-login.php */ 2.180.21.24 /* wp-login.php */ 2.182.209.107 /* wp-login.php */ 2.187.171.182 /* wp-login.php */ 2.229.27.202 /* /manager/ */ 2.237.24.187 /* wp-login.php */ 5.9.136.55 /* SlowLoris */ 5.20.156.72 /* SlowLoris */ 5.34.57.96 /* GET /?author= */

A complete list can be seen here

And this is the slow script:

#!/bin/bash debug=false #debug=true function Init { # constants cidr_input_file="./input.list" cidr_output_file="./output.list" default_maskvalue=24 # variables prev_first3="" prev_octet4="" prev_comment="" match_found=false prev_match_found=false prev_has_been_written=false current_maskvalue=0 active_maskvalue=0 exitcode=0 } function CondenseOnThreeOctets { $verbose && echo -n "["$(date)"] -- applying CIDR mask where possi +ble ..." # delete and create a new output file rm -f "${cidr_output_file}" && touch "${cidr_output_file}" while read line ; do if [ ! -z "$line" ] ; then # ignore empty lines if [[ $line != \#* ]] ; then # ignore lines that be +gin with a # character # only take first word on each line as IP address current_ip=$( cut -f1 <<< "${line}" ) # take everything after /* as a comment but only if it + exists [[ ${line} == */\** ]] && comment=$( sed 's|^.*/\*|/\* +|' <<< "${line}" ) || comment="" while IFS=. read octet1 octet2 octet3 octet4 ; do $debug && echo first3="${octet1}.${octet2}.${octet3}" $debug && echo "-- now checking - IP entry: ${firs +t3}.${octet4}" if [ -z "$prev_first3" ] ; then # first time through the loop - no previous va +lues have been saved yet. SaveThisIpAsPrev else if [ "$first3" = "$prev_first3" ] ; then # if here then first 3 octets matched so w +e can combine this IP with the previous IP match_found=true SaveThisIpAsPrev else # if here then first3 octets are different + so it's OK to save the previous IP match_found=false WriteIpAsCIDR fi fi done <<< "${current_ip}" fi fi done < "${cidr_input_file}" # write out last IP WriteIpAsCIDR $verbose && echo " done!" } function CalcMasks { [[ $active_maskvalue -eq 0 ]] && active_maskvalue=${default_maskva +lue} if [[ $octet4 == *"0/"* ]] ; then current_maskvalue=${octet4:2} [[ $current_maskvalue -lt $active_maskvalue ]] && active_maskv +alue=${current_maskvalue} else current_maskvalue=0 fi } function SaveThisIpAsPrev { $debug && echo "-- saving current IP as previous IP" CalcMasks prev_first3=$first3 prev_octet4=$octet4 prev_comment="$comment" prev_match_found=$match_found prev_has_been_written=false } function WriteIpAsCIDR { if [ ! -z "$prev_first3" ] ; then if ! $prev_has_been_written ; then if $prev_match_found ; then buildline="$prev_first3.0/$active_maskvalue\t$prev_com +ment" else buildline="$prev_first3.$prev_octet4\t$prev_comment" fi echo -e "${buildline}" >> "${cidr_output_file}" prev_has_been_written=true active_maskvalue=0 fi fi SaveThisIpAsPrev } echo "["$(date)"] >> started [$0@$HOSTNAME]" Init CondenseOnThreeOctets echo "["$(date)"] << finished [$0@$HOSTNAME]" exit $exitcode
Hopefully I've included all relevant information. Please let me know if there's something I missed.

Thanks everyone. :)

update: just replaced that previous code with a newer one in response to an idea I saw in Anonymous Monk's solution below (namely: perform a single search for first 3 octets). And it will now correctly parse more restrictive masks such as 0/21. So, it's gone from 18 seconds to 17 seconds. lol...


In reply to Re: perl quicker than bash? by TiffanyButterfly
in thread perl quicker than bash? by TiffanyButterfly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.