in reply to Extracting IP address from large text file.

IS THE BELOW CORRECT?
my $str = 'BIG BIG DATA FILE'; if (my @matches = $str =~ m{ ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0- +9]{1,3}:[0-9]{1,5}) }xmsg) { print qq{matched @matches}; }

It will extract all occurrences of what you define as an 'ip address' to the array.

BTW again: It is better to reply to a post immediately after (and 'below') the post rather than as an addendum to the OP: makes the conversation a lot easier to follow.

Replies are listed 'Best First'.
Re^2: Extracting IP address from large text file.
by AnomalousMonk (Archbishop) on Oct 19, 2010 at 22:58 UTC

    The little red plus sign (note that it is, indeed, red in the reply above) at the beginning of  +9{1,3}: is a line-wrap flag and is not intended to be included in the regex. The proper way to write this piece of the regex would be  [0-9]{1,3} or better yet  \d{1,3}

    Also: Please, Please, Puh-leeeeze use code tags. Please see Markup in the Monastery.

      Sorry i did miss the +9 but still getting an error

      # ATTEMPT 1 (works but will need to split file into words and test ea +ch word) my $str = "br>94.198.240.132:60988 asdfasdf 174.142.24.201:3128 asdfas +dfasdf"; ($p1, $p2, $p3 , $p4 , $p5) = ($str =~ /([0-9]{1,3}).([0-9]{1,3}).([0- +9]{1,3}).([0-9]{1,3}):([0-9]{1,5})/g); print "$p1 $p2 $p3 $p4 $p5 \n"; # ATTEMPT 2 (still getting an error message) my $str = "br>94.198.240.132:60988 asdfasdf 174.142.24.201:3128 asdfas +dfasdf"; if (my @matches = $str =~ m{ ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9 +]{1,3}:[0-9]{1,5}) }xmsg) { print qq{matched @matches};
      Missing right curly or square bracket at C:\CC\BUY\ptest2.pl line 16, at end of line syntax error at C:\CC\BUY\ptest2.pl line 16, at EOF Execution of C:\CC\BUY\ptest2.pl aborted due to compilation errors.

      Thank you again for all your help AnomalousMonk and Moritz

        In 'ATTEMPT 2', I think you are missing the closing } (right-curly-bracket) of the  if block (should be just after the  print statement). With this closing curly, the code works for me – see first example below. BWT: Use the  [download] link to download whatever is posted within  <code> ... </code> or  <c> ... </c> tags without inclusion of line-wrap flags!

        As for extracting individual digit fields from each extracted IP address, I think a two-step process would be best: see second example below.

        In general, see perlre, perlrequick, perlreref, perlretut,

        >perl -wMstrict -le "my $str = 'foo 94.198.240.132:60988 bar 174.142.24.201:3128 baz'; if (my @matches = $str =~ m{ ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0- +9]{1,3}:[0-9]{1,5}) }xmsg) { print qq{matched @matches}; } ;; if (my @ips = $str =~ m{ \d{1,3} (?: \. \d{1,3}){3} : \d{1,5} }xmsg) +{ for my $ip (@ips) { my ($p1, $p2, $p3, $p4, $p5) = $ip =~ m{ \d+ }xmsg; print qq{'$ip': '$p1' '$p2' '$p3' '$p4' '$p5'} } } " matched 94.198.240.132:60988 174.142.24.201:3128 '94.198.240.132:60988': '94' '198' '240' '132' '60988' '174.142.24.201:3128': '174' '142' '24' '201' '3128'
Re^2: Extracting IP address from large text file.
by Monkomatic (Sexton) on Oct 19, 2010 at 23:06 UTC
    I did try adding the following code but got an error:
    my $str = "br>94.198.240.132:60988 asdfasdf 174.142.24.201:3128 asdfas + +dfasdf"; if (my @matches = $str =~ m{ ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0- ++9]{1,3}:[0-9]{1,5}) }xmsg) { print qq{matched @matches}; }

    Invalid [] range "0- " in regex; marked by <-- HERE in m/ (0-9{1,3}\.0-9{1,3}\.0-9{1,3}\.0- <-- HERE +9{1,3}:0-9{1,5}) / at C:\CC\BUY\ptest.pl line 43.

    I Tried removing the enclosing () $str =~ m{ 0-9{1,3}\.0-9{1,3}\.0-9{1,3}\.0- +9{1,3}:0-9{1,5} }xmsg) = same error

    I Tried adding / / $str =~ m{ /0-9{1,3}\.0-9{1,3}\.0-9{1,3}\.0- +9{1,3}:0-9{1,5}/ }xmsg) = same error

    I also tried the below method with limited success.

    my $str = "br>94.198.240.132:60988 asdfasdf 174.142.24.201:3128 asdfas +dfasdf"; ($p1, $p2, $p3 , $p4 , $p5) = ($str =~ /([0-9]{1,3}).([0-9]{1,3}).([0- +9]{1,3}).([0-9]{1,3}):([0-9]{1,5})/g); print "$p1 $p2 $p3 $p4 $p5 \n";

    output: 94 198 240 132 60988

    But does not work /g globally without some kind of looping construct.. Sigh... well there is the split the entire file into words and do it for each word method available to me now at least :)

      ...well there is the split the entire file into words and do it for each word method available to me now at least :)

      I think that approach might be best for all concerned.