thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have spend so many hours and I still can not figure it out how to do it. I have a string that contains timestamps that I want to remove. As a second step I want to remove all the white space between the timestamp and the next "[" character. I have created a sample of code, but it is not even close to what I want to achieve:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $initialStr = "[20150302 22:01:05] [1, 2, 3, 4] String0\n [20150302 + 22:01:05] [1, 2, 3, 4] String1\n [20150302 22:01:05] [1, 2, 3, 4] St +ring2\n [20150302 22:01:05] [1, 2, 3, 4] String3\n [20150302 22:01:05 +] [1, 2, 3, 4] String4\n [20150302 22:01:05] [1, 2, 3, 4] String5\n [ +20150302 22:01:05] [1, 2, 3, 4] String6\n [20150302 22:01:05] [1, 2, +3, 4] String7\n"; my @matches = ( $initialStr =~ /\d+:\d+:\d+/ ); print Dumper \@matches; __END__ $VAR1 = [ 1 ];

I was thinking of having initialy a regex to remove the timestamps, and the next line character. And at the end another regex to remove the remaining white space until the first occurence of "[" this character for each substring. Something like that I had in my mind. I have not used regex over a year and I am really bad at it. Does anyone has an idea how to do that?

Sample of desired output:

$VAR1 = [ [1, 2, 3, 4] String0, [1, 2, 3, 4] String1, [1, 2, 3, 4] String2, [1, 2, 3, 4] String3, [1, 2, 3, 4] String4, [1, 2, 3, 4] String5, [1, 2, 3, 4] String6, [1, 2, 3, 4] String7 ];
Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re: regex remove blank/whitespace until first occurrence of specific character "["
by choroba (Cardinal) on Mar 02, 2015 at 21:56 UTC
    Data::Dumper can't produce the output you desire. Behold:

    To get all the matches from the m// operator, you need list context (which you already have), but also the /g option:

    my @matches = $initialStr =~ /\d+:\d+:\d+/g;

    Or, to get closer to your desired output:

    my @matches = $initialStr =~ /\[ .*? \] \s* ( \[ .*? \] .* )/gx;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Hello choroba,

      Thank you for your time and effort answering my question. You are absolutely right it works perfect. I have so many things to learn. :D

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: regex remove blank/whitespace until first occurrence of specific character "["
by AnomalousMonk (Archbishop) on Mar 02, 2015 at 22:06 UTC
    I have a string that contains timestamps that I want to remove. As a second step I want to remove ...

    I don't understand how the array enters into it, but taking "remove" to mean "substitute with an empty string", i.e., "delete", here's an approach:

    c:\@Work\Perl>perl -wMstrict -le "my $s = qq{[20150302 22:01:05] [1, 2, 3, 4] String0\n [20150302 22:01 +:05] [1, 2, 3, 4] String1\n}; print qq{<<<$s>>>}; ;; my $t_stamp = qr{ \d{8} [ ] \d\d (?: :\d\d){2} }xms; my $timestamp = qr{ \[ $t_stamp \] }xms; ;; $s =~ s{ $timestamp \s* }{}xmsg; print qq{<<<$s>>>}; " <<<[20150302 22:01:05] [1, 2, 3, 4] String0 [20150302 22:01:05] [1, 2, 3, 4] String1 >>> <<<[1, 2, 3, 4] String0 [1, 2, 3, 4] String1 >>>


    Give a man a fish:  <%-(-(-(-<

      Hello AnomalousMonk,

      Thank you for your time and effort, I have not this syntax before, interesting. The reason that I needed as an array is that I want to have the output splitted so I can apply it at different locations. But again, thank you for your time and effort it is always nice to see new ideas and approaches you never know where I might apply them.

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: regex remove blank/whitespace until first occurrence of specific character "["
by Anonymous Monk on Mar 02, 2015 at 21:53 UTC

    Have a look at the /m and /g switches (perlre): my @matches = $initialStr=~/^\s*\[.+?\]\s*(.+)$/mg;

      Hello Anonymous Monk,

      Sometimes it is so simple, and I am thinking so complicated. Thanks a lot for your time and effort it works liek a charm. :D

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: regex remove blank/whitespace until first occurrence of specific character "["
by kroach (Pilgrim) on Mar 02, 2015 at 22:47 UTC

    The following modification to the regular expression gives your desired output (I took no assumptions about the format of the text following a timestamp):

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my $initialStr = "[20150302 22:01:05] [1, 2, 3, 4] String0\n [20150302 + 22:01:05] [1, 2, 3, 4] String1\n [20150302 22:01:05] [1, 2, 3, 4] St +ring2\n [20150302 22:01:05] [1, 2, 3, 4] String3\n [20150302 22:01:05 +] [1, 2, 3, 4] String4\n [20150302 22:01:05] [1, 2, 3, 4] String5\n [ +20150302 22:01:05] [1, 2, 3, 4] String6\n [20150302 22:01:05] [1, 2, +3, 4] String7\n"; my @matches = $initialStr =~ /\[\d+ \d{2}:\d{2}:\d{2}\]\s+(.+)\n/g; print Dumper \@matches;

    This assigns the text after the timestamp and all following whitespace to an element of @matches.

    As to why the following regular expression doesn't work:

    my @matches = ( $initialStr =~ /\d+:\d+:\d+/ );

    What happens here is $initialStr =~ /\d+:\d+:\d+/ is being evaluated in scalar context because of the parentheses making it a single element of a list. Therefore, the result you get is the number of occurences of the given pattern in $initialStr and since /g is not used it's either 0 or 1.

    Furthermore, to capture matches into an array, which is what you're trying to do here, you need to add capture groups into the regular expression.

      Hello kroach,

      Thank you for your time and effort answering my question. All start to make sense step by step thanks to your explanation. So many things still to learn.

      Seeking for Perl wisdom...on the process of learning...not there...yet!