Split string in groups with non white space using regex

thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks,

I am having a problem that I am trying to solve. I am not really good with regex and I am facing a case that I need to solve it only with regex.

I am having a string e.g. 'Thanos1983+|Thanos1983+' that I want to split in 3 pieces, group 1 'Thanos1983+', group 2 '|' and group 3 'Thanos1983+'.

This can be easily done with the following code:

#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';

my $test = 'Thanos1983+|Thanos1983';
say "I found:\t\$1: '$1'\t\$2: '$2'\t\$3: '$3'"
    if $test =~ /(^[\w+\+]+)(\|)([\w+\+]+$)/;

__END__

$ perl test.pl
I found:    $1: 'Thanos1983+'    $2: '|'    $3: 'Thanos1983+'
[download]

The problem that I am having, in case that the string ends at the column e.g. 'Thanos1983+|' how can I detect the last group? What I would like to see is:

$ perl test.pl
I found:    $1: 'Thanos1983+'    $2: '|'    $3: ''
[download]

I tried with the following regex /(^[\w+\ +]+)(\|)([\w+\+ ]+$)/ but this only works only if string contains white space e.g. 'Thanos1983+| '.

Unfortunately I need to solve this using only regex, and the reason is that this regex is called from another script that I am not able to change and will only accept a regex.

Any suggestions are greatly appreciated.

Thanks in advance for everyone time and effort, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Comment on Split string in groups with non white space using regex Select or Download Code

Replies are listed 'Best First'.
Re: Split string in groups with non white space using regex by tybalt89 (Monsignor) on Jan 04, 2018 at 11:46 UTC
`#!/usr/bin/perl -l # http://perlmonks.org/?node_id=1206677 use strict; use warnings; for ( 'Thanos1983+\|Thanos1983', 'Thanos1983+\|' ) { print; /([^\|])(\\|)([^\|])/ and print "I found:\t\$1: '$1'\t\$2: '$2'\t\$3: + '$3'"; }` [download] Outputs: `Thanos1983+\|Thanos1983 I found: $1: 'Thanos1983+' $2: '\|' $3: 'Thanos1983' Thanos1983+\| I found: $1: 'Thanos1983+' $2: '\|' $3: ''` [download]	[reply] [d/l] [select]
Re^2: Split string in groups with non white space using regex by thanos1983 (Parson) on Jan 04, 2018 at 11:53 UTC
Hello tybalt89, Works perfectly, thanks for your time and effort. BR / Thanos Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re^3: Split string in groups with non white space using regex by AnomalousMonk (Archbishop) on Jan 04, 2018 at 18:12 UTC
Note that the string `'([^\|])(\\|)([^\|])'` also successfully parses `'\|' '\|\|' '\|\|\|'` c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $rx_string = '([^\|])(\\|)([^\|])' ; ;; for my $s ( 'Thanos1983+\|', 'Thanos1983+\| ', 'Thanos1983+\| ', 'Thanos1983+\|Thanos1983+', '\|Thanos1983+', ' \|Thanos1983+', ' \|Thanos1983+', '+++\|+++', '\|', '\|\|', '\|\|\|', ) { my $parsed = my @captured = $s =~ $rx_string; if ($parsed) { print qq{'$s' -> }, pp \@captured; } else { print qq{failed to parse '$s'}; } } " 'Thanos1983+\|' -> ["Thanos1983+", "\|", ""] 'Thanos1983+\| ' -> ["Thanos1983+", "\|", " "] 'Thanos1983+\| ' -> ["Thanos1983+", "\|", " "] 'Thanos1983+\|Thanos1983+' -> ["Thanos1983+", "\|", "Thanos1983+"] '\|Thanos1983+' -> ["", "\|", "Thanos1983+"] ' \|Thanos1983+' -> [" ", "\|", "Thanos1983+"] ' \|Thanos1983+' -> [" ", "\|", "Thanos1983+"] '+++\|+++' -> ["+++", "\|", "+++"] '\|' -> ["", "\|", ""] '\|\|' -> ["", "\|", ""] '\|\|\|' -> ["", "\|", ""] [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Split string in groups with non white space using regex by thanos1983 (Parson) on Jan 05, 2018 at 12:39 UTC
Re^5: Split string in groups with non white space using regex by AnomalousMonk (Archbishop) on Jan 05, 2018 at 14:14 UTC
Re: Split string in groups with non white space using regex by karlgoethebier (Abbot) on Jan 04, 2018 at 12:00 UTC
"...I am not really good with regex..." #MeToo - but didn't you miss a group: `#!/usr/bin/env perl use strict; use warnings; use feature qw(say); my $string = q'Thanos1983+\|Thanos1983+\|Thanos1983+'; $string =~ m/(.+)\\|(.+)\\|(.+)/; say for ($1,$2,$3); __END__` [download] Untested. And i hope i didn't miss the point. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^2: Split string in groups with non white space using regex by thanos1983 (Parson) on Jan 04, 2018 at 12:12 UTC
Hello karlgoethebier, To be honest I was just about to post another possible solution: `#!/usr/bin/perl use strict; use warnings; use feature 'say'; my @tests = ('Thanos1983+\|Thanos1983+', 'Thanos1983+\| ', 'Thanos1983+\|'); for (@tests) { say "I found:\t\$1: '$1'\t\$2: '$2'\t\$3: '$3'" if /(.)(\\|)(.)/; } __END__ $ perl test.pl I found: $1: 'Thanos1983+' $2: '\|' $3: 'Thanos1983+' I found: $1: 'Thanos1983+' $2: '\|' $3: ' ' I found: $1: 'Thanos1983+' $2: '\|' $3: ''` [download] Thanks for the tip of using `m/(.+)(\\|)(.+)/` this is also works for the first two cases but not for the third. Sample of code: `#!/usr/bin/perl use strict; use warnings; use feature 'say'; my @tests = ('Thanos1983+\|Thanos1983+', 'Thanos1983+\| ', 'Thanos1983+\|'); for (@tests) { say "I found:\t\$1: '$1'\t\$2: '$2'\t\$3: '$3'" if /(.+)(\\|)(.+)/; } __END__ $ perl test.pl I found: $1: 'Thanos1983+' $2: '\|' $3: 'Thanos1983+' I found: $1: 'Thanos1983+' $2: '\|' $3: ' '` [download] Thank you for your time and effort. BR / Thanos Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re^3: Split string in groups with non white space using regex by karlgoethebier (Abbot) on Jan 04, 2018 at 12:41 UTC
`(.+)\\|?` AKA one or none? ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re: Split string in groups with non white space using regex by lapazzo (Novice) on Jan 04, 2018 at 13:08 UTC
Brother thanos, in accordance with lao tzu ch.2 i would do `perldoc -f split` and then print join( '-', split( /\\|/, $test, -1) ), "\n";	[reply]
Re^2: Split string in groups with non white space using regex by thanos1983 (Parson) on Jan 04, 2018 at 14:12 UTC
Hello lapazzo, Thank you for the time and effort. The script is working perfectly but in my case I can only apply a regex I can not use split. Thanks again for your time and effort. BR / Thanos Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]