Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Programers
I'm trying to parse out some text and remove all the quotes. Would do it by hand but there are a lot of quotes and it is a long (503 pages) and so why waste my time when the computer is fully capible of doing it for me (once I tell it how.... sigh) but due to the greedyness problem I'm getting a few things that I don't intend removed.

ie "quote." things. "quote." removes everything listed.
sadly I don't have any real programing background, but I've heard perl has such great string manipulation controls that it can do this. I just can't seem to understand the syntax of it all.
I'm sorry this is such a lousy question.

  • Comment on I just cannot figure out the greedy regex

Replies are listed 'Best First'.
Re: I just cannot figure out the greedy regex
by Coruscate (Sexton) on Dec 16, 2003 at 05:01 UTC

    I'm going to assume you're using something along the lines of s/".*"//gs; to do this. It would explain why your example text would be completely removed. One way to fix this would be to use the '?' character to indicate non-greediness: s/".*?"//gs;. A better way would be to use a negated character class. An example using this follows:

    #!c:/perl/bin/perl -w $|++; use strict; my $txt = <<'TXT_DONE'; This is an "example", where all of the "quotes in this text" are removed, therefore cleansing the text of "all visible quotes". "So this quote", as well as "this quote here" will be stripped. TXT_DONE $txt =~ s/"[^"]+"//g; print $txt; __END__ This is an , where all of the are removed, therefore cleansing the text of . , as well as will be stripped.

      Just to ammend to the post by Coruscate that you might also want to handle escaped quotes with the following regex:
      s/"(?:\\"|.)*?"//g;

        And just to amend to the post by Roger that you might also want to remove quoted strings with newlines in them, lest you wish to end up with very strange results, with the following regex:

        s/"(?:\\"|.)*?"//gs;
Re: I just cannot figure out the greedy regex
by chromatic (Archbishop) on Dec 16, 2003 at 04:21 UTC

    Can you reply with a few more pieces of information? I'd like to see:

    • A sentence or two of text that has quotes you'd like to remove.
    • The same text after you've removed the quotes appropriately.
    • The smallest possible piece of code you've tried.

    If you just want to remove quote marks, something simple would be:

    $text =~ tr/'"//d;

    There's no semantic or syntactical analysis there, though, so it might not be appropriate.