Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Replacing \n even when it shouldn't be :/

by ultranerds (Hermit)
on May 18, 2010 at 07:08 UTC ( [id://840456]=perlquestion: print w/replies, xml ) Need Help??

ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Got a bit of a weird one.

I'm trying to write some code to "clean up" my HTML pages (so they are smaller and more SEO friendly)

I've got this line:

            $html =~ s/\s{2,100}/ /sig;

...and:

            $html =~ s/\s+/ /sig;

..but that seems to replace the newlines (so the WHOLE page is all one one line)

These are the other lines (but I've commented them out, just to check which line was causing the problem)

$html =~ s/\r+/\n/sig; $html =~ s/\t+/ /sig; $html =~ s/[\n]{2,5}/\n/sig;


I'm at a real loss as to whats going on

Can anyone suggest anything? Surely should be a simple case of /\s+/ / ?

I've also done a little test script, and have exactly the same issue:

#!/usr/bin/perl print "Content-Type: text/html \n\n"; my $html = q| some stuff sdfsdfsdf dfgd fg sdfsdfsf |; # $html =~ s/\r+/\n/sig; # $html =~ s/\t+/ /sig; $html =~ s/\s+/ /sig; # $html =~ s/[\n]{2,5}/\n/sig; print $html;


..and that outputs:
perl ./cgi-bin/test.cgi Content-Type: text/html some stuff sdfsdfsdf dfgd fg sdfsdfsf
TIA

Andy

Replies are listed 'Best First'.
Re: Replacing \n even when it shouldn't be :/
by cdarke (Prior) on May 18, 2010 at 08:04 UTC
    \s includes all whitespace characters, including a newline. From perlretut:

    \s matches a whitespace character, the set [\ \t\r\n\f] and others

    You will need to create a character classes which contains just those you wish to delete. [[:blank:]] might be suitable.
      Hi,

      Ah wow - never knew that! This seems to do the trick:

      $html =~ s/[ ]{2,5}/ /sig;

      Thanks for your help :) This one was driving me nuts!

      Cheers

      Andy
Re: Replacing \n even when it shouldn't be :/ ([^\S\n])
by tye (Sage) on May 18, 2010 at 18:16 UTC
Re: Replacing \n even when it shouldn't be :/
by poolpi (Hermit) on May 18, 2010 at 12:34 UTC
Re: Replacing \n even when it shouldn't be :/
by JavaFan (Canon) on May 18, 2010 at 15:03 UTC
    Why the /s and /i modifiers?

    Basically what you want is to replace any non-empty string of whitespace with either a newline (if the string contains a newline), or with a space. Here's a 5.10 way of doing it:

    s/(?:\s*\n\s*(*:N))|(?:\s{2,}(*:S))/{"N","\n","S"," "}->{our $REGMARK} +/ge;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://840456]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (10)
As of 2024-04-18 12:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found