Greetings fellow monks,

I'm in the process of building a few manuals for my department which are basically compilations of "stuff" from various sources in various formats. When scraping things with copy-n-paste from intranet pages, I was always annoyed at how the text when pasted somewhere would always have extra spaces, line breaks, and other general formatting badness.

Perl to the rescue. The little snippet below I shortcutted to my desktop, so whenever I'm entering "scrape-mode," I just double-click. Then all is well.

#!/usr/bin/perl -w use strict; use Win32::Clipboard; my $clipboard = Win32::Clipboard(); print "Text auto-format-clean when copied to the clipboard is active.\ +n"; print "To exit, CNTL-C out of this window.\n"; while (1) { $clipboard->WaitForChange(); my $text = $clipboard->GetText(); $text =~ tr/\r\n/ /; while ($text =~ s/ / /g) {}; $text =~ s/^ //; $text =~ s/ $//; $clipboard->Set($text); } exit;

Although this seems like a really simple thing (actually, I guess it is), this little script has saved me extremely large amounts of time already today.

God, I love Perl.

-gryphon
code('Perl') || die;

Replies are listed 'Best First'.
Re: Auto text format cleanup via Win32::Clipboard
by gryphon (Abbot) on Aug 17, 2001 at 02:06 UTC

    Greetings again,

    Here's version 2 of this little thing. It deals with multiple paragraphs better, and it gets rid of those stupid while loops that were employed to cover my regex ineptness.

    #!/usr/bin/perl -w use strict; use Win32::Clipboard; my $clipboard = Win32::Clipboard(); print "\nText auto-format-clean when copied to the clipboard is active +.\n"; print "To exit, CNTL-C out of this window.\n\n\n"; print "Usage\n", '='x75, "\n"; while (1) { $clipboard->WaitForChange(); my $text = $clipboard->GetText(); $text =~ tr/\r//d; $text =~ s/\n{2}/\r/g; $text =~ s/[ \n\t\f]+/ /g; $text =~ s/\s*\r\s*/\r/g; $text =~ s/\r/\r\n/g; $text =~ s/^\s*//; $text =~ s/\s*$//; $clipboard->Set($text); my $now = localtime(); print $now, ' => ', substr($text, 0, 45), "...\n"; } exit;

    Anyone know why, when this is first run, the program generates a whole bunch (20+) report lines? Funny thing is, it doesn't happen all the time, and I can't seem to isolate why based on user behavior. Weird.

    -gryphon
    code('Perl') || die;

      You could try setting the clipboard to a know state, before entering you while loop.

      Or check if $text eq "" after regex's and contiune on next iteration of the loop.

      --
      The Snowman
      snowman@notreally.co.uk
      

        Thanks!

        I added $clipboard->Empty();
        after my $clipboard = Win32::Clipboard();
        and added next if ($text eq '');
        before $clipboard->Set($text);, and that seems to have done the trick.

        -gryphon
        code('Perl') || die;

Re: Auto text format cleanup via Win32::Clipboard
by OeufMayo (Curate) on Aug 20, 2001 at 03:44 UTC

    Here's a Win::32 Clipboard Auto Formater boosted with Good Dr Conway's Steroids!

    Text::Autoformat does a really good job at finding patterns in a non structured text, and format it properly. I use it on a daily basis (mail, documentation), along with perltidy. These two tools save me hours of painful reformatting.

    #!/usr/bin/perl -w use strict; use Win32::Clipboard; use Text::Autoformat; my $clipboard = Win32::Clipboard(); $clipboard->Empty(); print "Text auto-format-clean when copied to the clipboard is active +.\n"; print "To exit, CNTL-C out of this window.\n"; while (1) { $clipboard->WaitForChange(); my $text = $clipboard->GetText(); next if ($text eq ''); $text = autoformat($text, {all=>1}); # The magic kicks in. $clipboard->Set($text); } exit;

    Untested, but it should work (having said that, I won't be surprised that it dies horribly)

    <kbd>--
    my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>
Re: Auto text format cleanup via Win32::Clipboard
by EvanK (Chaplain) on Aug 19, 2001 at 11:05 UTC
    ++ man....but how about s/\t/ /g or something to filter out tabs?

    ______________________________________________
    RIP
    Douglas Noel Adams
    1952 - 2001

      Greetings EvanK,

      Actually on line 16 (if you include spaces) the line that reads:

      $text =~ s/[ \n\t\f]+/ /g;

      will sub-out all tabs and other stuff for spaces. I would have put \s except I used \r as a paragraph delimiter, so I had to specifically specify the \s set without \r in it.

      -gryphon
      code('Perl') || die;

      Update: Oops. Sorry, just noticed you were replying to "version 1" not the updated code. You're right, I should have added that...