Melk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I've created a find/replace script that works successfully by finding <TList.Bullet><TList.Item>sometext</TList.Item></TList.Bullet> and replacing it with <ul><li>sometext</li></ul>.

See code excerpt below:

my %findReplaceH = (
q[<TList.Bullet>]=>q[<ul>],
q[</TList.Bullet>]=>q[</ul>],
q[<TList.Item>]=>q[<li>],
q[</TList.Item>]=>q[</li>],

However, in the output, I'm now getting blank spaces between tags, such as <ul> space <li> and </li> space </ul>.

Can I add something to this line in the script, q[<TList.Bullet>]=>q[<ul>], so the blank space will no longer appear after <ul>? Or, do I need to do the find/replace, and then remove the blank space?

Thanks for your help!

Replies are listed 'Best First'.
Re: Remove blank space during find/replace
by jethro (Monsignor) on Nov 27, 2009 at 13:46 UTC
    You seem to have a nice long script and you have shown us a small code excerpt that is somewhat related, but really has nothing to do with your problem.

    The problem should be either in a regular expression statement or in the code before before that statement where the hash %findReplaceH is handled and modified for use in that regex statement

Re: Remove blank space during find/replace
by BioLion (Curate) on Nov 27, 2009 at 14:14 UTC

    As others have put - it is hard to tell where the spaces are coming from without more code. However, your replacement hash looks fine, so you probably have a sneaky space in your replacement code along the lines of (but a '#' instead of ' '):

    my $string = '<TList.Bullet><TList.Item>sometext</TList.Item></TList.B +ullet>'; print "Before : \'$string\'\n"; my %find_replace = ( q[<TList.Bullet>] => q[<ul>], q[</TList.Bullet>] => q[</ul>], q[<TList.Item>] => q[<li>], q[</TList.Item>] => q[</li>], ); for (keys %find_replace){ $string =~ s/\Q$_\E/\#$find_replace{$_}/; } print "After : \'$string\'\n"; ## Gives : # Before : '<TList.Bullet><TList.Item>sometext</TList.Item></TList.Bul +let>' # After : '#<ul>#<li>sometext#</li>#</ul>'

    But it woul help to see the replacement code and come real input / output.

    Update : Forgot to include output... Now included with code.

    Just a something something...

      Thanks for the suggestions, I've inserted my code below. Please let me know if you see the code that is generating the blank space.

      use strict;
      use File::Find;
      use File::Path;
      use File::Copy;
      my $folderPath = "C:/Folder1";

      my %findReplaceH = (
      q[<TList.Bullet>]=>q[<ul>],
      q[</TList.Bullet>]=>q[</ul>],
      <TList.Item>]=>q[<li>],
      q[</TList.Item>]=>q[</li>],
      my $useRegexQ = 0;
      $folderPath =~ s[/$][];

      $folderPath =~ m[/(\w+)$];
      my $previousDir = $`;
      my $lastDir = $1;

      my $totalFileChangedCount = 0;
      sub fileFilterQ ($) {
      my $fileName = $_[0];

       if ($fileName =~ m{\.xml$}) {
      print "processing: $fileName\n";
      return 1;};

      return 0;
      };

      sub processFile {
       my $currentFile = $File::Find::name; # full path spect
      my $currentDir = $File::Find::dir;
      my $currentFileName = $_;

      if (not fileFilterQ($currentFile)) {
        return 1;
      }
      if (not(open FILE, "<$currentFile")) {die("Error opening file:
      $!");};
       my $wholeFileString;
        {local $/ = undef; $wholeFileString = <FILE>;};
        if (not(close(FILE))) {die("Error closing file: $!");};

      # do the replacement.
       my $replaceCount = 0;

       foreach my $key1 (keys %findReplaceH) {
          my $pattern = ($useRegexQ ? $key1 : quotemeta($key1));
        $replaceCount = $replaceCount + ($wholeFileString =~
      s/$pattern/$findReplaceH{$key1}/g);
       };

      if ($replaceCount > 0) { # replacement has happened
      $totalFileChangedCount++;
      # get the file mode.
      my ($mode, $uid, $gid) = (stat($currentFile))[2,4,5];

        # write out a new file.
      if (not(open OUTFILE, ">$currentFile")) {die("Error opening file: $!");};

       print OUTFILE $wholeFileString;
      if (not(close(OUTFILE))) {die("Error closing file: $!");};

       # set the file mode.
       chmod($mode, $currentFile);
      chown($uid, $gid, $currentFile);

       print "$replaceCount replacements made at\n";
        print "$currentFile\n";
      }
      };
        I'm not sure if you have a 'cut and paste' issue, but the code you supply has syntax errors. The hash declaration for %findReplaceH is not terminated with a ), and you have a q[ missing on one line. I get:
        "my" variable $folderPath masks earlier declaration in same scope at C +:\gash.pl line 17. "my" variable $folderPath masks earlier declaration in same scope at C +:\gash.pl line 19. Unmatched right square bracket at C:\gash.pl line 14, at end of line syntax error at C:\gash.pl line 14, near "<TList.Item>]" syntax error at C:\gash.pl line 68, near "print" Execution of C:\gash.pl aborted due to compilation errors.
        I added use warnings;

        Update 2
        Fixing these issues, I then found that neither subroutine is actually called, I guess you are missing a File::Find.
        Anyway, so I extracted the RE and and tried a simple test. This time I assumed that the data had embedded spaces, so I altered the RE:
        my %findReplaceH = ( q[<TList.Bullet>\\s*] =>q[<ul>], q[\\s*</TList.Bullet>]=>q[</ul>], q[<TList.Item>\\s*] =>q[<li>], q[\\s*</TList.Item>]=>q[</li>], ); my $useRegexQ = 1; my $wholeFileString='<TList.Bullet><TList.Item> sometext </TList.Item> +</TList.Bullet>'; my $replaceCount = 0; foreach my $key1 (keys %findReplaceH) { my $pattern = ($useRegexQ ? $key1 : quotemeta($key1)); $replaceCount = $replaceCount + ($wholeFileString =~ s/$pattern/$findReplaceH{$key1}/g); }; print "$wholeFileString\n";
        and it appears to work correctly without extra spaces:
        <ul><li>sometext</li></ul>
        Note that I set $useRegexQ to force quotemeta
Re: Remove blank space during find/replace
by cdarke (Prior) on Nov 27, 2009 at 14:12 UTC
    As jethro says, show us yer RE. Most likely you have an embedded space in it.
    But: are you sure the extra output is a space? Could it be that you are trying to display an unprintable character? Have you tried running the script on the command-line rather than to a browser? You might then get to see the real output, particularly if you pipe it through a utility like od(1).
Re: Remove blank space during find/replace
by Anonymous Monk on Nov 27, 2009 at 13:38 UTC
    No, there is nothing you can add to that line to prevent extra spaces, that line contains no spaces.

    FYI, all spaces are blank :D