Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

formatting text

by texuser74 (Monk)
on Jun 20, 2008 at 02:24 UTC ( [id://693043]=perlquestion: print w/replies, xml ) Need Help??

texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Pls help me in optimizing my code.
I have an input file test.txt, in which i need to remove extra enter marks and spaces in the specific lines that are starting with <item>.

input: test.txt

<item>sample item text one <item>sample item text two <item>sample item text three <item>sample item text four
My present code does it, but i want to do it with fewer lines of code
open(IN, "test.txt") || die "\nCan't open test.txt \n"; open(OUT, ">test.out"); $/=""; { local $/ = '<item>'; print OUT scalar <IN>; for (<IN>) { s@([\d\D]*?<item>)@ my $var = $1; $var =~ s!\s+! !g; $var @e; print OUT; } } close(IN); close(OUT);
The required output is:
<item>sample item text one <item>sample item text two <item>sample item text three <item>sample item text four

Replies are listed 'Best First'.
Re: formatting text
by ikegami (Patriarch) on Jun 20, 2008 at 03:00 UTC

    Two quick points first.

    /[\d\D]/? That looks like a weird way to say /./s or even /(?s:.)/.

    for (<IN>) is similar to while (<IN>) except it takes way more memory since it reads the entire file into memory. Use while (<IN>).

    Now back to the question,

    open ...; open ...; $/ = ''; while (<IN>) { s/^\s+//; s/\s(?=\s)//g; print OUT; }

    Or as a one-liner

    perl -pe"BEGIN{$/=''} s/^\s+//; s/\s(?=\s)//g" test.txt > test.out

    I'm sure there are many HTML formatters out there, though. You'd probably be better off using one of them.

Re: formatting text
by Narveson (Chaplain) on Jun 20, 2008 at 03:22 UTC

    Using your own global substitution s!\s+! !g, but shortening everything else:

    while (<DATA>) { next if !/^<item>/; s!\s+! !g; print "$_\n"; } __DATA__ <item>sample item text one <item>sample item text two <item>sample item text three <item>sample item text four
Re: formatting text
by waldner (Beadle) on Jun 20, 2008 at 07:40 UTC
    a simple one liner is
    perl -ne 'if (/^<item>/){s/[ \t]+/ /g;print;}' test.txt
    I don't use \s because that would remove the newline character at the end of the line too.

    Edit: made it a little shorter and added /g (which is of course the most important thing).
Re: formatting text
by jwkrahn (Abbot) on Jun 20, 2008 at 03:51 UTC

    Perhaps this will do what you require:

    open IN, '<', 'test.txt' or die "Can't open 'test.txt' $!"; open OUT, '>', 'test.out' or die "Can't open 'test.out' $!"; while ( <IN> ) { next unless /\S/; tr/ \t/ /s; print OUT; # Updated -- thanks Narveson } close IN; close OUT;

      Revised I set up test.txt and ran the first version of this code and got no output. Added use warnings and posted the warning messages here with no comment. Sorry if that came across as unfriendly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://693043]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-24 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found