bayareamonk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that looks like this:

text.txt yoda yoda yoda ip-compute-10.10.10.1-internal yadda yadda yadda some more stuff ###########

I need to replace ip*[internal] with ip-compute-10.25.25.25-internal.

This does not work:

perl -p -i.bak -e 's/\ip*\internal/ip-10.10.10.1-internal/g' text.txt

Replies are listed 'Best First'.
Re: Newbie regular expressions question
by davido (Cardinal) on Apr 11, 2014 at 22:52 UTC

    You're making up regular expression syntax. Perl doesn't know what you mean. Perl thinks you mean replace 'i', followed by any number of 'p' characters with 'ip-10.10.10.1-internal'.

    I started trying to provide a regexp that would do what you want, but there are inconsistencies between your sample input, sample output, and the replacement portion of your sample code. If I guess, I'll probably guess wrong.

    Perhaps you could follow up here with sample input and sample output that are consistent with one another, and a little clarification on the rules you wish to follow in transforming the input to the output so that we can assist without guessing.

    Also, making up regex syntax is never going to be the path to regex success. You really do need to spend some time with perlintro and perlretut.


    Dave

      text.txt input
      <yoda yoda yoda> <ip-compute-10.10.10.1-internal> <yadda yadda yadda> <some more stuff>
      text.txt output
      <yoda yoda yoda> <ip-compute-10.10.25.18-internal> <yadda yadda yadda> <some more stuff>
      I'm trying to figure out what regular expression I need to get the desired output. I'm using Perl 5.10 on Oracle Enterprise Linux 6.5.

        You still haven't explained what rules should be followed. One could easily just write a regexp that matches exactly your input string, and only that input string. Or one could write a regexp that matches any input string containing what appears to be an IP address. One will be way too restrictive, and one too permissive. I'd like to get it right, but you're not helping.

        So my wild guess is that you just want to replace any IP address that is nested between the literal text, "<ip-compute-", and "-internal>". Here's an example that accomplishes that:

        use strict; use warnings; use Regexp::Common 'net'; my $input = <<'EOI'; <yoda yoda yoda> <ip-compute-10.10.10.1-internal> <yadda yadda yadda> <some more stuff> EOI my $want = <<'EOW'; <yoda yoda yoda> <ip-compute-10.10.25.18-internal> <yadda yadda yadda> <some more stuff> EOW my $replacement_ip = '10.10.25.18'; $input =~ s/(<ip-compute-)(?:$RE{net}{IPv4})(-internal>)/$1$replacemen +t_ip$2/; print $input eq $want ? "Success:\n" : "Failure:\n"; print "\t$_\n" for split /\n/, $input;

        The output will be this:

        Success: <yoda yoda yoda> <ip-compute-10.10.25.18-internal> <yadda yadda ya +dda> <some more stuff>

        I used Regexp::Common's ::net extension to generate the portion of the regular expression that matches an IPv4 address. I did this because I didn't want to use a naive regexp such as (?:\d{1,3}\.){3}\d{1,3} only to find that it works most of the time, but matches some things that couldn't be valid IP's occasionally, and because I didn't want to trouble myself or yourself with the pain of coming up with a more robust pattern on my own.

        If you're not allowed to use a module, the regular expression generated by $RE{net}{IPv4} is this:

        (?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[ +0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0 +-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))

        ...which is precisely why I didn't want to reinvent it myself. ;)

        It seems possible you're dealing with XML, so you might find an XML parsing module to be a more robust solution in the longrun (though the learning curve might be more to begin with). Also, even if this solution I've provided works for you, if you plan to do this more than once, you owe it to yourself, your employer, and those people scattered across the Internet who will help you, to spend an hour reading perlretut.


        Dave

Re: Newbie regular expressions question
by kcott (Archbishop) on Apr 11, 2014 at 22:45 UTC

    G'day bayareamonk,

    Please use code tags for your data as well as your code.

    You say you want to replace ip*internal (which is what I see but I'd guess that's supposed to be ip*[internal] because [internal] would generate that link). However, I don't see either of those forms in what you show as "looks like this", so I'm reduced to making more guesses.

    You probably have one or more of these problems in your regex (/\ip*\internal/):

    • The first \i doesn't need escaping, i.e. just i would be OK.
    • The * (which has a special meaning in a regex) does need escaping, i.e. \* (or use a character class [*]).
    • The second \i is probably missing an left bracket, i.e. \[i.
    • You may be missing a \] after internal.
    • Your data doesn't suggest that the 'g' modifier is needed.

    So, overall, my guess is that your substitution should look like this:

    s/ip\*\[internal\]/ip-10.10.10.1-internal/

    If you provide a clear picture of an input line, what output you expect and what output you're actually getting (along with any error or warning messages), we can probably provide a better answer. More information about what to post and how to post it, such that we can better help you, can be found in the "How do I post a question effectively?" guidelines.

    Update: s/left brace/left bracket/ (in 3rd dot point)

    -- Ken

Re: Newbie regular expressions question
by Laurent_R (Canon) on Apr 12, 2014 at 10:12 UTC
    The requirement is not entirely clear (and there is a contradiction between your description of the need and the code you show), but a simple correction to your Perl one-liner could be something like this:
    perl -pi.bak -e 's/ip-compute-[\d.]{7,15}-internal/ip-compute-10.25.25 +.25-internal/g' test.txt
    Since I am not sure of what you need exactly, this is an example of running this command showing what it does:
    $ cat test.txt text.txt yoda yoda yoda ip-compute-10.10.10.1-internal yadda yadda yadda some more stuff $ perl -pi.bak -e 's/ip-compute-[\d.]{7,15}-internal/ip-compute-10.25. +25.25-internal/g' test.txt $ cat test.txt text.txt yoda yoda yoda ip-compute-10.25.25.25-internal yadda yadda yadda some more stuff
    Please note that this is a very very naive approach to IP address matching and that the [\d.]{7,15} pattern is only looking for any combination of 7 to 15 digits and dots. It might match many things other than IP addresses, such as a phone number (read carefully davido's caveats on the subject in his post above). In this specific case, however, one might argue that it is relatively safe to use this because the words "ip-compute-" and "-internal" coming respectively immediately before and after the numbers-and-dots matching part are making false matching relatively unlikely, but you are the only one to know the real content of the file you want to process and you are therefore the only one to be able to know whether this will be sufficient. If you need the match to be more selective, then you run into exactly the kind of problem that davido is describing: we probably don't want to reinvent the wheel.
Re: Newbie regular expressions question
by Anonymous Monk on Apr 11, 2014 at 22:22 UTC