Re: multi line matching problem
by Roger (Parson) on Dec 16, 2003 at 02:07 UTC
|
Perhaps you are looking for something like this instead?
use strict;
use warnings;
my $data = do {local $/; <DATA>};
$data =~ y/ \n/ /s;
# add the following if you want to strip space after >
# and before < characters.
# $data =~ s/(?<=>)\s|\s(?=<)//gm;
print "$data\n";
__DATA__
<thing>
condition
condition
randomness
other junk
more junk
</thing>
and the output -
<thing> condition condition randomness other junk more junk </thing>
| [reply] [d/l] [select] |
|
wow, thats perfect and answered in record time.
thanks
| [reply] |
Re: multi line matching problem
by Zaxo (Archbishop) on Dec 16, 2003 at 04:10 UTC
|
One more way
$text = join ' ', split ' ', $text;
uses magical split. The is one difference in the result; magical split will remove leading and trailing whitespace, instead of replacing it with a single space.
| [reply] [d/l] |
Re: multi line matching problem
by SquireJames (Monk) on Dec 16, 2003 at 02:18 UTC
|
$data =~ y/\n/ /; # This will remove all newlines and replace them wit
+h spaces
Better still, if you want to replace all the greater than one space(s) in your regular expression and take care of any newlines that you have at the time, you can do so with this RegEx:
$data =~ s/\s+|\n/ /g;
The main key here is that \s* has been replaced with \s+, so it's no longer greedy and will replace only multiple space characters.
The difference between s and m is that s is used to substitue an expression, whilst m is used to test for a pattern match. M is not really used too much as it is implied when you place a regular expression between two slashes (ie. $data =~ /\n/g is the same as $data=~ m/\n/g). The y modifier is a simple character replacement (transliteration).
Update: With much thanks to Enlil for the explaination, the whole regex could actually be written without the \n (i.e. $data =~ s/\s+/ /g). | [reply] [d/l] [select] |
|
| [reply] |
|
For the record, this is the testing that I did, which works fine by me.
$data = "15 65\n35 6\n445 34,546 59034584\n54 3,450 805;5409
+ 8534\n\nStuff...";
print ($data);
print ("\nChainging now\n\n");
$data =~ s/\s+|\n/ /g;
print ($data);
Sorry if I misunderstood the question, and I'll take the hit on the greedy statement, perhaps I should have said greedy only for space characters, which is what is wanted.... | [reply] [d/l] |
|
|
|
|
Thanks for the tip on implied m/// I will keep looking around at these things
| [reply] |
Re: multi line matching problem
by Enlil (Parson) on Dec 16, 2003 at 02:22 UTC
|
What else are you doing to $text as your code, $text =~ s/\n\s*/ /g; deletes space chars preceeded by newlines, which looking at your test case seems to do what you want (in this case):
my $this = '<thing>
condition
condition
randomness
other junk
</thing>';
print "before: $this\n\n";
$this =~ s/\n\s*/ /g;
print "after: $this\n";
__END__
before: <thing>
condition
condition
randomness
other junk
</thing>
after: <thing> condition condition randomness other junk </thing>
One thing though is that \n is in the set of \s characters so $text=~s/\s+/ /g; should suffice.-enlil
| [reply] [d/l] [select] |
Re: multi line matching problem
by doom (Deacon) on Dec 16, 2003 at 06:56 UTC
|
Rather than striping newlines, I'll try and answer your other question, about doing a "multiline
matching expression". Do you know about the "s///ms" trick?
Typically you use the m and s modifiers when working on a
string with embedded newlines, one changes the meaning of
. so that it also matches a /n, the other changes the meaning of ^ and $ so that they match the beginning and end
of lines (most people I know use them both together and
don't bother remembering which one does which...):
my $string = <<ENDSTRING;
<thing>
condition condition
condition condition
randomness
other junk
</thing>
ENDSTRING
print "$string\n";
$string =~ s{ <thing>.*?</thing> }
{<THANG>blah</THANG>}msx;
print "$string\n";
That should output:
<thing>
condition condition
condition condition
randomness
other junk
</thing>
<THANG>blah</THANG>
I'd recommend reading the "matching within multiple lines"
recipe in the Perl Cookbook (that's recipe 6.6 in both the 1st and 2nd editions).
And by the way... you're not rolling your own code to parse
HTML or XML are you? You should be looking for already
existing modules out on CPAN.
| [reply] [d/l] |
|
| [reply] [d/l] |
|
Perl By Example (on-line)the Perl Cookbook (on-line). ...And then, He rested.:)
NOTE: I AM HAPPY TO HAVING STUDIED FROM THE PERL BY EXAMPLE BOOK AT MY PUBLIC LIBRARY (you should also try your public library, you'ld be amazed! And it is always so quiet... And perhaps you find the Camel!).-
| [reply] |
|