texuser has asked for the wisdom of the Perl Monks concerning the following question:

I have strings embedded in a file with other text that look like this

{ Name 1.2.3 xxxx xxxxx}

where xxxx and xxxxx could be a-z, 0-9 or : (a single colon).

There could be one or more spaces after { and before the } and there could be one or more spaces between the groups in the string. Also the string could be split over two lines like

{ Name 1.2.3

xxxx xxxxx }

{ Name 1.2.

3 xxxx xxxxx}

{ Name 1.2.3 xxxx

xxxxx }.

So it could be split at a space or after a dot. There could be one or more spaces at the end of the split line before the newline like this

{ Name 1.2.3\ \ \ \

xxxx }

where I use \ to denote a space.

My question is how do I get rid of all these strings in a file. I'm using Perl under Windows 10 if that helps.

Thanks.

  • Comment on Delete a string possibly over two lines

Replies are listed 'Best First'.
Re: Delete a string possibly over two lines
by tybalt89 (Monsignor) on Feb 06, 2018 at 04:41 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1208527 use strict; use warnings; local $/ = '}'; # use } as line terminator... while(<DATA>) { s/.*\K\{ Name.*\}//s; print; } __DATA__ { Name 1.2.3 xxxx xxxxx} where xxxx and xxxxx could be a-z, 0-9 or : (a single colon). There could be one or more spaces after { and before the } and there c +ould be one or more spaces between the groups in the string. Also the + string could be split over two lines like { Name 1.2.3 xxxx xxxxx } { Name 1.2. 3 xxxx xxxxx} { Name 1.2.3 xxxx xxxxx }. So it could be split at a space or after a dot. There could be one or +more spaces at the end of the split line before the newline like this { Name 1.2.3\ \ \ \ xxxx } where I use \ to denote a space. My question is how do I get rid of all these strings in a file. I'm us +ing Perl under Windows 10 if that helps. Thanks.

    Outputs:

    where xxxx and xxxxx could be a-z, 0-9 or : (a single colon). There could be one or more spaces after { and before the } and there c +ould be one or more spaces between the groups in the string. Also the + string could be split over two lines like . So it could be split at a space or after a dot. There could be one or +more spaces at the end of the split line before the newline like this where I use \ to denote a space. My question is how do I get rid of all these strings in a file. I'm us +ing Perl under Windows 10 if that helps. Thanks.
      s/.*\K\{ Name.*\}//s;

      I don't understand the purpose of the  .*\K sub-expression in this substitution. Is it perhaps intended as a defensive measure against nested  { ... } groups? I suppose the thing to do is play around with it a bit, but I don't have time ATM. Could you please comment on this?


      Give a man a fish:  <%-{-{-{-<

        Yes, it is defense against nested (or only "first half of") expressions.

      Thank you for your quick response and code. It get rids of the string but leaves things like { } after it runs. Is there a way to remove the enclosing brackets as well? I probably wasn't clear that I wanted to delete them as well.

      Also if there are other strings with similar patterns in the file, like { AnotherName 4.5.6 xxx xxxx}, do I have to write separate perl scripts or can I do it all in one script?

      Thanks again.

        ... if there are other strings with similar patterns in the file, like { AnotherName 4.5.6 xxx xxxx}, do I have to write separate perl scripts ...

        Ah, what's in a name? A set of names could be defined as
            my $name = qr{ \b (?: Phil | Bob | Hal) \b }xms;
        and used in tybalt89's substitution statement as
            s/.*\K\{ $name.*\}//s;
        (untested). However, you have to define what constitutes a "name".

        Update 1: Similarly, a "dotted decimal" regex could be defined that would match any dotted decimal (with optional embedded whitespace), some subset, or... For instance (also untested):

        my $dotted_decimal = qr{ (?<! \d) \d+ (?: \s* [.] \s* \d+){2,3} (?! \d) }xms;
        This can, I think, be broken over arbitrary lines. Again, you must decide what is needed in your application.

        Update 2: I've changed the  $dotted_decimal definition above to conform to the expanded specification examples provided here: counted quantifier  {2} changed to  {2,3} instead (and it's still untested).


        Give a man a fish:  <%-{-{-{-<

        No, it does not leave { } in the output. Did you mis-copy it?

        Looks like you are changing the spec. Please give exact definition of "other strings with similar patterns".

Re: Delete a string possibly over two lines
by AnomalousMonk (Archbishop) on Feb 06, 2018 at 05:08 UTC
    I'm using Perl under Windows 10 if that helps.

    For future reference, the version of Perl you're using is usually more pertinent than the OS or OS version. For instance, the  \K regex operator that tybalt89 uses (perhaps unnecessarily? no, there's a purpose) was introduced with Perl version 5.10. That's not likely to be a problem here since 5.10 was released (more than?) a decade ago. Just sayin'...


    Give a man a fish:  <%-{-{-{-<

Re: Delete a string possibly over two lines
by Anonymous Monk on Feb 06, 2018 at 14:44 UTC
    Depending on the complexity of your situation, tools like Parse::RecDescent can also be useful. This is a very efficient, industrial-strength parser which allows you to define the structure of your input using a grammar. You might not need it here, but, "when you do need it, you will know."