Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hey, there is something that has been bothering me for a long time. Bothering only, not making me unable to use Perl, because usually I found ways to "do it differently".

The problem is simple: I have a long string, with the word that I want to match several times in it.

Example: I open a file, which is itself a Perl Script, I read it in an array and join("",@) it in a single string. Let's say that I want to get rid of the statements that do the print "Content-type:text/html\n\n"; stuff. There can be several statements like that, and they can have different forms (print-one-space-string, print-as-many-spaces-as-you-want-string, print-parenthesis-string, etc.).

My approach is: I see that the expression I want to get rid of starts with print, and then the relevant string. So I want to match something that starts with print and end with the string, but how can I be sure that I am matching the print which is the closest to the string? (I don't want to get rid of a big chunck of my file that starts with any print and end with the string).

Thanks for your help (it must be an easy question for a real Perl Monk!)

Replies are listed 'Best First'.
Re: A Reg Exp Question
by swiftone (Curate) on Jun 28, 2000 at 19:08 UTC
    This is a job for a full blown parser. You can look into Parse::RecDescent on CPAN.

    This is because it sounds like you'll want to match even:

    print "Bob was $frank \"yes!\" around"."No Way!", $foo, "\n"; #or print "joe" if $bar;
    Which a normal regexp will be unable to follow easily.

    That said, if this is a one-time job with certain rules you can rely on, there are ways to do what you want. For example, if every statement ends with a semi-colon, and the word "print" is never used in strings in the Perl script, you could try matching

    /print[^;]*;/
    But be aware that even something as simple as:
    if($foo){ print "Bar" } # or $foo= "Ask frank to print the report";
    Would mess that up. If you have a situation with many situations, use a real Parser.

    Note also that you might mess up the logic of a program. See:

    print "Foo\n" while &func($bar);
    If your program relies on &func($bar) being called several times, you would have eliminated the entire statement if using the simple regexp; Thanks to lhoward for introducing me to Parse::RecDescent
Re: A Reg Exp Question
by davorg (Chancellor) on Jun 28, 2000 at 19:12 UTC

    Presumabley you have an expression something like this:

    $text =~ s/print.*a_string//g;

    The problem is that by default the .* is greedy. That is to say, it takes up as much of the string as it can. What you need to to change your code to

    $text =~ s/print.*?a_string//g;

    which will make the .* non-greedy. This is covered in more detail in perldoc perlre. --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000
    <http://www.yapc.org/Europe/>

RE: A Reg Exp Question
by ZZamboni (Curate) on Jun 28, 2000 at 19:24 UTC
    Like swiftone said, for the general case you really want a full-blown parser. But if you can narrow the possibilities, a regex solution may work.

    My first assumption would be that all the print statements you want to eliminate are contained in a single line. This seems to be the case from your example. In this case, I would suggest processing line-by-line instead of joining the whole thing in a single string and processing that.

    My second assumption would be that the prints you want to eliminate will have one of the following forms:

    print "string" print("string") print HANDLE "string" print(HANDLE "string")
    where spacing can be arbitrary, handles are always all-uppercase, and the string does not contain double quotes.

    Given this, I would do something like the following:

    while(<FILE>) { s/print(\s+|\s*\(\s*)([A-Z]+\s+)?"string"(\s*\))?/1/; print; }
    This is untested. Notice that I am substituting the print statement with a 1, so that it does not break things like:
    print "string" if (somecond) becomes 1 if (somecond) something if print HANDLE "string" something if 1 etc.

    --ZZamboni

Re: A Reg Exp Question
by maverick (Curate) on Jun 28, 2000 at 19:39 UTC
    First off, you can pick up the entire file in one bang by setting $/
    open(IN,"somefile.pl"); undef $/; $string = <IN>;
    save yourself a while and a join.
    $string = q{ print "Content-type:text/html\n\n"; print "Some big hunk of code\n"; print("Content-type:text/html","\n","\n"); print ('Content-type: text/html',"\n","\n"); print "Yet another big hunk of code\n"; print "Yet more big hunks of code\n"; print ("Content-type: text/html"); print 'Content-type: text/html',"\n +\n"; print "Some other big hunk of code that does something really cool\n"; print ("Content-type: text/html"); print 'CONTENT-type: text/html' , " +\n", "\n"; print "Some code that ends the program\n"; }; print $string,"\n"; $string =~ s/print[\s'"\(]+content-type:\s*text\/html[\\nr\s'"\),]+;// +gi; print "---------------\n"; print $string,"\n";
    This covered all the 'normal' ways of printing a line that I could think of. I you run across any more, I'll update the expression.
    /\/\averick
      Of course, good programming style dictates that $/ should be made local in a block, so you don't trample any exterior variation:
      { open(IN,"somefile.pl"); local $/; undef $/; $string = <IN>; } #previous version of $/ is now restored
        oh yeah, sorry. :)
        I also realized that my expession doesn't handle file handles (props to ZZamboni)
        I'll fix both here shortly.

        /\/\averick
Re: A Reg Exp Question
by chromatic (Archbishop) on Jun 28, 2000 at 22:36 UTC
    If you want to dump lines that begin with a specific string, this approach works nicely. Instead of slurping the file (with the $/ technique), apply a quick regex on each line:
    # open file, don't do the local $/ trick while (<FILE>) { next if /^\s*print/; next if /^\s*#/; push @lines, $_; }
    You could, of course, append $_ to a variable called $line if you wished. Feel free to add more small regexes, or modify the existing ones. Sometimes a line-by-line transformation is easier.
Re: A Reg Exp Question
by gaggio (Friar) on Jun 28, 2000 at 20:03 UTC
    You could also just replace the actual string by "" using s///. That way, you will print nothing instead of printing something, that is you will DO nothing. Because if I understood it right, the string is always the same and can't be split!

    Just my 2 cts...