multi line matching problem

jcpunk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multi line matching problem by Roger (Parson) on Dec 16, 2003 at 02:07 UTC
Perhaps you are looking for something like this instead? `use strict; use warnings; my $data = do {local $/; <DATA>}; $data =~ y/ \n/ /s; # add the following if you want to strip space after > # and before < characters. # $data =~ s/(?<=>)\s\|\s(?=<)//gm; print "$data\n"; __DATA__ <thing> condition condition randomness other junk more junk </thing>` [download] and the output - `<thing> condition condition randomness other junk more junk </thing>` [download]	[reply] [d/l] [select]
Re: Re: multi line matching problem by jcpunk (Friar) on Dec 16, 2003 at 02:45 UTC
wow, thats perfect and answered in record time. thanks	[reply]
Re: multi line matching problem by Zaxo (Archbishop) on Dec 16, 2003 at 04:10 UTC
One more way `$text = join ' ', split ' ', $text;` uses magical split. The is one difference in the result; magical split will remove leading and trailing whitespace, instead of replacing it with a single space. After Compline, Zaxo	[reply] [d/l]
Re: multi line matching problem by SquireJames (Monk) on Dec 16, 2003 at 02:18 UTC
`$data =~ y/\n/ /; # This will remove all newlines and replace them wit +h spaces` [download] Better still, if you want to replace all the greater than one space(s) in your regular expression and take care of any newlines that you have at the time, you can do so with this RegEx: `$data =~ s/\s+\|\n/ /g;` [download] The main key here is that \s* has been replaced with \s+, so it's no longer greedy and will replace only multiple space characters. The difference between s and m is that s is used to substitue an expression, whilst m is used to test for a pattern match. M is not really used too much as it is implied when you place a regular expression between two slashes (ie. $data =~ /\n/g is the same as $data=~ m/\n/g). The y modifier is a simple character replacement (transliteration). Update: With much thanks to Enlil for the explaination, the whole regex could actually be written without the \n (i.e. $data =~ s/\s+/ /g).	[reply] [d/l] [select]
Re: Re: multi line matching problem by welchavw (Pilgrim) on Dec 16, 2003 at 02:50 UTC
Part of your explanation is flatly erroneous. That \s+ is still greedy. The \n is never matched in your s/\s+\|\n/ regex.	[reply]
Re: Re: Re: multi line matching problem by SquireJames (Monk) on Dec 16, 2003 at 02:57 UTC
For the record, this is the testing that I did, which works fine by me. `$data = "15 65\n35 6\n445 34,546 59034584\n54 3,450 805;5409 + 8534\n\nStuff..."; print ($data); print ("\nChainging now\n\n"); $data =~ s/\s+\|\n/ /g; print ($data);` [download] Sorry if I misunderstood the question, and I'll take the hit on the greedy statement, perhaps I should have said greedy only for space characters, which is what is wanted....	[reply] [d/l]
Re: Re: Re: Re: multi line matching problem by welchavw (Pilgrim) on Dec 16, 2003 at 03:15 UTC
Re: Re: Re: Re: Re: multi line matching problem by SquireJames (Monk) on Dec 16, 2003 at 03:26 UTC
Some notes below your chosen depth have not been shown here
Re: Re: multi line matching problem by jcpunk (Friar) on Dec 16, 2003 at 02:47 UTC
Thanks for the tip on implied m/// I will keep looking around at these things	[reply]
Re: multi line matching problem by Enlil (Parson) on Dec 16, 2003 at 02:22 UTC
What else are you doing to `$text` as your code, `$text =~ s/\n\s/ /g;` deletes space chars preceeded by newlines, which looking at your test case seems to do what you want (in this case): `my $this = '<thing> condition condition randomness other junk </thing>'; print "before: $this\n\n"; $this =~ s/\n\s/ /g; print "after: $this\n"; __END__ before: <thing> condition condition randomness other junk </thing> after: <thing> condition condition randomness other junk </thing>` [download] One thing though is that \n is in the set of \s characters so `$text=~s/\s+/ /g;` should suffice. -enlil	[reply] [d/l] [select]
Re: multi line matching problem by doom (Deacon) on Dec 16, 2003 at 06:56 UTC
Rather than striping newlines, I'll try and answer your other question, about doing a "multiline matching expression". Do you know about the "s///ms" trick? Typically you use the m and s modifiers when working on a string with embedded newlines, one changes the meaning of . so that it also matches a /n, the other changes the meaning of ^ and $ so that they match the beginning and end of lines (most people I know use them both together and don't bother remembering which one does which...): `my $string = <<ENDSTRING; <thing> condition condition condition condition randomness other junk </thing> ENDSTRING print "$string\n"; $string =~ s{ <thing>.*?</thing> } {<THANG>blah</THANG>}msx; print "$string\n";` [download] That should output: <thing> condition condition condition condition randomness other junk </thing> <THANG>blah</THANG> I'd recommend reading the "matching within multiple lines" recipe in the Perl Cookbook (that's recipe 6.6 in both the 1st and 2nd editions). And by the way... you're not rolling your own code to parse HTML or XML are you? You should be looking for already existing modules out on CPAN.	[reply] [d/l]
Re: Re: multi line matching problem by jcpunk (Friar) on Dec 16, 2003 at 08:33 UTC
ah ha! In my foolish frustration I forgot about the perl cookbook and its wealth of useful data. Sadly I consulted google (with very poor search terms no less (shame upon me)) and when it turned up crap that wasn't helpful, I got lazy and posted to here - after checking the tutorials section. The "s///ms" trick is going into my working memory. I thank thee for pointing it out to me. In regards to the rolling my own code for parsing out html/xml, I sort of am, but not without good reason. The program in progress needs to have some files in html format for no terribly good reason (insert boss module) and so it shall be. And in regards to CPAN, lets just say an overly paranoid sysadmin stands between me and that alternative. I could of course go for the `use lib '/bla/foo/meh'` option, but for other reasons too complex waste your time on, that also isn't workable (see site inconsistency). Ending rant before I realize what a crappy assignment I have and quit. Again thanks a lot for the help. jcpunk all code is tested, and doesn't work so there :p (varient on common PM sig for my own ammusment)	[reply] [d/l]
Re: Re: Re: multi line matching problem by chanio (Priest) on Dec 16, 2003 at 14:48 UTC
Perl By Example (on-line) the Perl Cookbook (on-line). ...And then, He rested.:) NOTE: I AM HAPPY TO HAVING STUDIED FROM THE PERL BY EXAMPLE BOOK AT MY PUBLIC LIBRARY (you should also try your public library, you'ld be amazed! And it is always so quiet... And perhaps you find the Camel!).-	[reply]
Re: Re: Re: Re: multi line matching problem by jcpunk (Friar) on Dec 18, 2003 at 02:19 UTC


Think about Loose Coupling
	PerlMonks