DavidFerrington has asked for the wisdom of the Perl Monks concerning the following question:

I've always considered myself reasonable with regexp and been using Perl since about 1990, but assertions have me beat!

Running Perl 5.8.8. I'm parsing files in a directory tree using File::Find and sucking each file into $str in order to manipulate it in various ways. One of those manipulations is trying to perform conditional code substitution al,la CPP. Have a file that reads:

first line
#ifdef ABC
second line conditional on ABC defined
#else
second line conditional on ABC not defined
#endif
third line
that's in the $str var. I have a var, lets call it $macro = 'ABC' and I'm doing
$str =~ m/^#ifdef\s+($macro)$(.*)(^#else$)?(.*)^#endif$/sm;
Ok that parses and I get matches, but a number of issues with that:

1) the first (.*) swallows everything before the ^#endif - now if it wasn't a string with embedded new lines, I could use [^#] to stop before the #else, but I only want to stop before an entire #else, at the beginning of a line at the beginning of the line, a #else in the middle of the line must pass thru.

2) once I'm matching ok, I want to substitute the string between the ABC and the #else, or the string between #else and #endif, depending on if $macro = ABC.

I think I need to use assertions, but that's as far as I can figure and having pawed over PP 3rd ed I still haven't figured it out.

Sorry if this covered somewhere, if it is, please just point me to it.

Seasons Greetings

-- David

Replies are listed 'Best First'.
Re: assertions help
by ikegami (Patriarch) on Dec 22, 2009 at 18:33 UTC
    The following can usually be used to skip over a section that doesn't contain something that matches PATTERN:
    (?(?!PATTERN).)*
    Sounds like you want
    (?(?!^#\s*else\b).)*
    Have you considered
    #ifdef X ...a... # ifdef Y ...b... # else ...c... # endif ...d... #else ...e... # ifdef Y ...f... # else ...g... # endif ...h... #endif
Re: assertions help
by BrowserUk (Patriarch) on Dec 22, 2009 at 19:07 UTC

    For the simple case:

    #! perl -slw use strict; my $str = q[ first line #ifdef ABC second line conditional on ABC defined #else second line conditional on ABC not defined #endif #ifdef PQR second line conditional on PQR defined #else second line conditional on PQR not defined #endif third line ]; my $macro = 'ABC'; $str =~ s[ \#ifdef\s(\S+)\s+ ([^\n]+)\n \#else\s+ ([^\n]+)\n \#endif ]{ $1 eq $macro ? $2 : $3 }xeg or die 'No match'; print $str; __END__ C:\test>junk7 first line second line conditional on ABC defined second line conditional on PQR not defined third line

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hmmm, thanks to people for replies, something for me to try. In the last case, I think that only allows for one line between the #ifdef and the #else and likewise between the #else and the #endif, but in relality, there can be many lines??
      -- David

        Update:Tweaked to prevent extraneous newlines.

        Good examples are everything:

        #! perl -slw use strict; my $str = q[ first line #ifdef ABC second line conditional on ABC defined stuff more stuff #else second line conditional on ABC not defined other stuff and maybe some more and a bit more #endif #ifdef PQR second line conditional on PQR defined #else second line conditional on PQR not defined #endif third line ]; my $macro = 'ABC'; $str =~ s[ \#ifdef\s(\S+)\s+ ( (?: [^\n]+ \n )*? [^\n]+ ) \n \#else\s+ ( (?: [^\n]+ \n )*? [^\n]+ ) \n \#endif ]{ $1 eq $macro ? $2 : $3 }xeg or die 'No match'; print $str; __END__ C:\test>junk7 first line second line conditional on ABC defined stuff more stuff second line conditional on PQR not defined third line

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: assertions help
by kennethk (Abbot) on Dec 22, 2009 at 18:34 UTC
    I think your issue is greedy matching - for the provided code you'll get the expected result if you use *?, i.e.:

    $str =~ m/^#ifdef\s+($macro)$(.*?)(^#else$)?(.*?)^#endif$/sm;

    Combining non-greedy matching with a non-capturing group on the conditional clause will do you:

    $str =~ m/^#ifdef\s+($macro)$(.*?)(?:(^#else$)(.*?))?^#endif$/sm;

    1. Start with the required item #ifdef $macro with variable whitespace on its own line.
    2. Grab the shortest set of characters that still supports the regex.
    3. If an #else is encountered on its own line, use that to delimit a break and grab text following it into the next buffer.
    4. End with the required item #endif on its own line

    Note that the lines will still be surrounded by newline characters, as in your original case. To get rid of those, you could instead use:

    ^#ifdef\s+(ABC)\s+^(.*?)(?:(^#else)\s+^(.*?))?^#endif\s*$

    Note ikegami's point above re: nesting - regular expressions are terrible about dealing with nested structures.

Re: assertions help
by AnomalousMonk (Archbishop) on Dec 22, 2009 at 20:55 UTC

    DavidFerrington seems to be looking for a form of macro processor. Other replies give good regex solutions to the OPer's problem in terms of its narrow definition in the OP. However, these solutions seem fragile in that they may need substantial revision when faced with new or expanded requirements (which, as we know, always appear).

    Then why not use a macro processor, e.g., m4 or cpp, either invoked from Perl or from the command line? The OPer may be working under Windows, but I'm sure versions could be found for this OS (cpp used to be a pretty standard C compiler utility).

    Update: Changed m4 and cpp above into man page links.

      I'm processing SQL, not C and CPP states results might be unknown if the parsed file is not in C format. I couldn't find m4 in our production build (presume it's removed like a lot of dev tools on a prod build) and if I remember correctly, m4 doesn't handle
      #if defined (ABC) or defined(DEF)
      something I'm going to tackle next.

      Lastly, I wanted to try to do this in Perl and only Perl. Since I'm wighting a new tool and can set limits on it's use, plus most of what I'm dealing with is legacy, I think I'm safe here. But yes I understand about creeping featurism!

      But thanks for the thoughts, they were my first ones too.

      -- David
        ... CPP states results might be unknown if the parsed file is not in C format.

        I'm not sure what this refers to. In my understanding, cpp is at root a text macro processor. It may have a number of features adapting it to C, C++ and ObjectiveC/C++ file processing, but these may be easily turned off. Strictly to satisfy my curiosity, can you give a reference to or example of the warning you mention?

        ... if I remember correctly, m4 doesn't handle
         #if defined (ABC) or defined(DEF)

        I am not very familiar with m4, but I think you may be right, at least in that form. But if you can accept  || in place of  or as your logical operator, cpp would do the trick.

        In any event, you seem to know what you want, so let me wish you every success in your quest.

Re: assertions help
by DavidFerrington (Acolyte) on Dec 22, 2009 at 23:50 UTC
    Well, thank you to all, a combination of your suggestions lead to
    $str =~ s/\n^#ifdef\s+(\S+)\s*$(.*?)\n*(?:(?:^#else$)(.*?))?\n*^#endif$/{$1 eq $macro ? $2 : $3}/igsme;
    I had missed the /e modifier after all these years, would have been useful in the past.

    I haven't used the minimal match or the no-capture options, but then not missed them.

    The above solution works very well and fast, so I'm very pleased with that.

    On the comment of nesting, I'm processing exisitng files, none of which use nesting and so I can state in my docs that nesting isn't supported, going forwards. I don't think there will be a need for it in my circumstances.

    -- David