perl regex to match newline followed by number and text

arunkumarzz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: perl regex to match newline followed by number and text by GrandFather (Saint) on May 31, 2019 at 10:24 UTC
Care to show us some sample data and how you'd like the result to look? Note that you can use `(?:...)` to group stuff without capturing which can clean things up quite a bit. As a hint, I've added a little white space to your regex to make the groupings more obvious and added a digit at the start of each capture group. Maybe those numbers are not quite what you expect? `s/((\n) ([^0-9])+ (-)* (Aa-Zz)) \| ((\n) (\d{3}) (-) (Aa-Zz))/$2$3/g +x; # 12 3 4 5 67 8 9 0` [download] Update:* Maybe what you want to achieve is something like this: `use strict; use warnings; my $wholeBallOfWax = do {local $/; <DATA>}; my @records = split /(?<=\n)(?=\d+-)/, $wholeBallOfWax; s/\n+$/\n/s for @records; print join "---\n", @records; __DATA__ 1-12 last non-blank field 2-10 data more data 3-21 stuff more stuff Lots of stuff so much stuff there is no following empty field 4-73 Sneeky record with a blank field in the middle! 5-00 Last record` [download] Which prints: `1-12 last non-blank field --- 2-10 data more data --- 3-21 stuff more stuff Lots of stuff so much stuff there is no following empty field --- 4-73 Sneeky record with a blank field in the middle! --- 5-00 Last record` [download] In the split regex there is a look behind (`(?<=\n)`) which matches a new line before the current search point, and a look ahead (`(?=\d+-)`) which matches one or more digits followed by a hyphen. Neither match "consumes" the string that was matched so the split doesn't drop any characters. As an aside, the `do {local $/; <DATA>}` bit suspends end of line detection and reads everything from `<DATA>` into $wholeBallOfWax (although maybe that was obvious?). Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply] [d/l] [select]
Re^2: perl regex to match newline followed by number and text by arunkumarzz (Novice) on May 31, 2019 at 15:12 UTC
Hello Grandfather, The example you have provided is little bit different from my issue. I have updated my question with some sample data. Could you please have a look at it? Thanks in advance	[reply]
Re^3: perl regex to match newline followed by number and text by poj (Abbot) on May 31, 2019 at 15:48 UTC
`#!/usr/bin/perl use strict; my $record; while (<DATA>){ s/\n/ /; if (/^\d+~/){ $record =~ s/ +$//; # trim trailing spaces printf "%s\n",$record if ($record); $record = $_; } else { $record .= $_; } } $record =~ s/ +$//; printf "%s\n",$record if ($record); __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human` [download] poj	[reply] [d/l]
Re^4: perl regex to match newline followed by number and text by arunkumarzz (Novice) on Jun 02, 2019 at 17:58 UTC
Re^3: perl regex to match newline followed by number and text by Marshall (Canon) on Jun 01, 2019 at 06:54 UTC
I see your Edit 1. Can you enclose the data in code tags so that we can see the new lines? The better the problems is described, the better the result will be. Your regex doesn't make much sense to me. `s/((\n)([^0-9])+(-)(Aa-Zz))\|((\n)(\d{3})(-)(Aa-Zz))/$2$3/g` My brain hurts.	[reply] [d/l]
Re: perl regex to match newline followed by number and text by AnomalousMonk (Archbishop) on May 31, 2019 at 20:05 UTC
`s/((\n)([^0-9])+(-)(Aa-Zz))\|((\n)(\d{3})(-)(Aa-Zz))/$2$3/g` Note that the `Aa-Zz` regex subexpression in the quoted regex matches a literal `'Aa-Zz'` sequence of these five characters. This subexpression within a (capturing!) group with a `` quantifier means that this sequence may be matched zero* or more times. Perhaps what was meant was a `[a-zA-Z]` character class, in which case `[a-zA-Z]*` would have been appropriate (or perhaps better `[a-zA-Z]+`) since the capturing group seems completely unneeded. (But there are many other problems with the original regex, so going back to the beginning and starting from scratch seems the best course; see other suggestions in this thread.) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: perl regex to match newline followed by number and text by arunkumarzz (Novice) on Jun 02, 2019 at 06:17 UTC
Sorry, my regex might work in Oracle but its different in perl! Thanks for your response.	[reply]
Re: perl regex to match newline followed by number and text by hippo (Archbishop) on May 31, 2019 at 15:27 UTC
Per your data set as it stands just now, here is an SSCCE: `use strict; use warnings; use Test::More tests => 1; my @have = ( '99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human', '98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human' ); my @want = ( '99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human', '98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human' ); for (@have) { s/\n/ /sg; } is_deeply (\@have, \@want);` [download] Since you haven't said what your input record separator is I have created the array by hand. See also How to ask better questions using Test::More and sample data.	[reply] [d/l]
Re: perl regex to match newline followed by number and text by GrandFather (Saint) on May 31, 2019 at 22:50 UTC
So something more like: `use strict; use warnings; my $wholeBallOfWax = do {local $/; <DATA>}; my @records = split /\s(?<=\n)(?=\d+~)/, $wholeBallOfWax; s/\n+/ /gs for @records; s/\s+\z//gs for @records; print join "\n", @records; __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human 97~Grand~Father~Mobilenum: 2734-567 , from Mars Ape` [download] Prints: `99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human 97~Grand~Father~Mobilenum: 2734-567 , from Mars Ape` [download] where the only substantive change from my suggested earlier code was to replace hyphen with tilda and replace internal newlines with spaces. Update:* or Per Hippo's suggestion: use strict; use warnings; use Test::More tests => 1; my @want = ( '99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human', '98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human', '97~Grand~Father~Mobilenum: 2734-567 , from Mars Ape' ); my $wholeBallOfWax = do {local $/; <DATA>}; my @records = split /\s*(?<=\n)(?=\d+~)/, $wholeBallOfWax; s/\n+/ /gs for @records; s/\s+\z//gs for @records; is_deeply (\@records, \@want); __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human 97~Grand~Father~Mobilenum: 2734-567 , from Mars Ape [download] Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply] [d/l] [select]
Re: perl regex to match newline followed by number and text by Marshall (Canon) on Jun 01, 2019 at 04:47 UTC
I am having trouble understanding the problem statement. You data was not enclosed in `<code>..</code>` tags and that is a problem. I don't really understand what removing the newlines in the 4th field means? This is just a wild guess on my part - I guess that extra new lines meant spacers between these records - but maybe not?: use strict; use warnings; my $line; while (<DATA>) { chomp; if ( (/:$/../^\s$/) =~ /^\d+$/) #exclude endpoint. { s/\s,\s/,/; $line .= " $_"; } elsif (defined $line) { $line =~ s/^\s//; print "$line\n"; $line = undef; } } print "$line\n" if defined $line; # just to be sure # all output is done =prints 99~Arun~Kumar~Mobilenum: 1234-567,from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901,from Earth Human 98~Mahesh~Babu~Mobilenbbb: 5678-901,from Earth Human =cut __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human 98~Mahesh~Babu~Mobilenbbb: 5678-901 , from Earth Human [download] I guess something very, very simple like this is possible? 3 input lines to one output line? use strict; use warnings; my $input = do {local $/; <DATA>}; my @lines = $input =~ m/(.\n.\n.*\n)/g; foreach my $line (@lines) { $line =~ s/\n/ /g; $line =~ s/ , /,/g; print "$line\n"; } =Prints 99~Arun~Kumar~Mobilenum: 1234-567,from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901,from Earth Human 98~Mahesh~Babu~Mobilenbbb: 5678-901,from Earth Human =cut __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human 98~Mahesh~Babu~Mobilenbbb: 5678-901 , from Earth Human [download]	[reply] [d/l] [select]
Re^2: perl regex to match newline followed by number and text by holli (Abbot) on Jun 01, 2019 at 11:57 UTC
I guess something very, very simple like this is possible? 3 input lines to one output line? Good idea, but why then still use a regex? `while (<DATA>) { chomp; print; print $. % 3 ? " " : "\n"; } __DATA__ 99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human` [download] Output `99~Arun~Kumar~Mobilenum: 1234-567 , from Earth Human 98~Mahesh~Babu~Mobilenum: 5678-901 , from Earth Human` [download] holli You can lead your users to water, but alas, you cannot drown them.	[reply] [d/l] [select]
Re^3: perl regex to match newline followed by number and text by arunkumarzz (Novice) on Jun 01, 2019 at 18:07 UTC
The number of newlines is not fixed, it might be one or two or three or more newlines in the last field	[reply]
Re^4: perl regex to match newline followed by number and text by holli (Abbot) on Jun 01, 2019 at 18:29 UTC