hurix_01 has asked for the wisdom of the Perl Monks concerning the following question:

How can i remove enter marks within lines

For Example

This Coding Is not Removing enter marks

$text=~s#<p>(.+?)<\/p>#"<p>".&enter($1)."<\/p>"#gsie; sub enter { my $new; ($new)=$1; $new=~s/\n//gsi; return $new; }

Can any one help me out, whether i have to modify at any point so that enter marks are removed within all

(Paras) That are found in the doc

Replies are listed 'Best First'.
Re: Removing Enter marks between Lines
by GrandFather (Saint) on Jun 20, 2007 at 10:56 UTC

    It seems to work as expected and, as far as I can tell, as you wish for me:

    use strict; my $text = <<TEXT; <p>Before new line. After new line.</p> TEXT print "Before: |$text|\n"; $text=~s#<p>(.+?)<\/p>#"<p>".&enter($1)."<\/p>"#gsie; print "After: |$text|\n"; sub enter { my $new; ($new)=$1; $new=~s/\n//gsi; return $new; }

    Prints:

    Before: |<p>Before new line. After new line.</p> | After: |<p>Before new line.After new line.</p> |

    If that is not what you want then perhaps you should provide your own sample and expected output?

    That aside, there are a few things worth mentioning:

    • it is generally considered bad practice to call subs using & these days
    • you pass a parameter to the sub, but don't use it (see below)
    • you use $1 as a "global" to the sub which is bad in itself, but also there is no need! (see below)
    • hand parsing HTML is a bad, mad and dangerous thing to do. Use HTML::TreeBuilder or one of the other modules designed for the purpose
    • (minor) you use # to delimit the parts of the substitution, but quote the / - / is not special if not the delimiter so doesn't need to be quoted

    The sub would be better written:

    sub enter { my $new = shift; $new=~s/\n\s*/ /gi; return $new; }

    Note the alteration to the regex so that a space is inserted in place of a new line, but that leading white space on the line is removed:

    After: |<p>Before new line. After new line.</p> |

    DWIM is Perl's answer to Gödel
Re: Removing Enter marks between Lines
by ww (Archbishop) on Jun 20, 2007 at 13:53 UTC

    Is it possible that you are confusing HTML paragraph tags with the newline value(s) used by your system? What you see in your browser is rendered (created, especially for you!) in response/obedience to the paragraph tags (<p> and </p>

    As is, your code does nothing for me, except spew warnings and errors.

    The HTML in your $text contains NO newlines. What's in the original; ie, what did your assign to $text?

    Your phrase, "enter marks," coupled with your $text, and the .&enter($1) (concatenated sub call ?!??) on the replacement side the substitution leave /me confused and guessing

    Update: rephrased, added (I hope) clarity to the far-too-telegraphic original.

Re: Removing Enter marks between Lines
by starX (Chaplain) on Jun 20, 2007 at 13:31 UTC
    I guess I'm not understanding how this is supposed to work. When reading in a line of input, generally $_ will contain everything up to the next newline character, in which case you can just chomp that, as in...
    open FILE, $filename; while (<FILE>) { chomp($_); # removes \n. # do something to $_ }
    I can't think of a situation where I've ever encountered a newline char in the middle of a line. Could you elaborate a little bit more on the type of input data you're working with?

    Taking another look at your post, it seems as if what you actually want to do is remove paragraph tags from your html. In which case, apart from what others have said, you'll be wanting to look at a regex that looks more like: $new=~s/<p>|<\/p>//gsi