michael.kitchen has asked for the wisdom of the Perl Monks concerning the following question:

Edited on 8/20, 2017

The Solution:

for (@loot) {s/\xD7//g};

The Problem:

I really am a novice at this. I learned some basic CGI programming using Perl 5 from a book. I needs have always been pretty simple and so I have not signed up for any in depth training. Sorry for only supplying one line of code in my original post. I will supply more code in this post. Before doing so I would like to explain a little more about what I am doing in hopes that the additional information will help.

First of all I am creating a fansite for a game that I play. I am copying information from a wiki site and putting it into my makeCREATURE.cgi that saves pre-formatted data to a file that will be used by my creatureDETAIL.cgi script when visitors want to view details of a creature from the game. I use Firefox to view wiki information and perform my copy. I then paste it into Chrome (because that is where my bookmarks are for my stuff). I came up with the necessary substitution statements to take care of everything that needed to be eliminated or html tags added, etc. I just could not come up with the substitution for the multiplication sign. Viewing source of wiki page shows that × is being used for the multiplication sign. I created a small program (lootALTERATION.cgi) to handle all the substitutions I could figure out how to do. I then remove the multiplication signs manually and copy/paste into my makeCREATURE.cgi script.
An example of the information I am copying/pasting is: 0-100× Gold Coin

I will supply the entire code from lootALTERATION.cgi:

#!/usr/bin/perl use CGI qw/:standard/; push(@INC, "/cgi-bin"); require("cgi-lib.pl"); &ReadParse(*input); @loot = $input{'loot'}; print "Content-type: text/html\n\n"; print qq{ <HTML> <HEAD> <TITLE>Loot Alteration</TITLE> </HEAD> <BODY> <form method=post action="../cgi-bin/lootALTERATION.cgi"> <table border=2 width=100%> }; print "<tr>\n"; print "<td>Loot</td>\n"; print "<td>Copy / Paste Paragraph.<br><textarea name=loot cols=70 rows +=10>"; for (@loot) {s/ //g}; for (@loot) {s/×//g}; for (@loot) {s/ \(semi-rare\)//g}; for (@loot) {s/ \(rare\)//g}; for (@loot) {s/ \(very rare\)//g}; for (@loot) {s/ \(extremely rare\)//g}; for (@loot) {s/\R/<br>/g}; print "@loot"; print "</textarea></td>\n"; print "</tr>\n"; print qq{ <tr> <td><input type=submit></td> <td><input type=reset></td> </tr> </table> </form> </BODY> </HTML> };
I'm sure there are better and more efficient ways of doing things. Not against learning new ways, but this way works and it is what I know. Please be kind. :)
adding <code>for (@loot) {s/\N{MULTIPLICATION SIGN}//g}; only causes error 500

Replies are listed 'Best First'.
Re: replace multiplication symbol (&times;)
by kcott (Archbishop) on Aug 20, 2017 at 08:32 UTC

    G'day michael.kitchen,

    Welcome to the Monastery.

    You may need additional code to handle encoding. Without any context to the one line of code you posted, it's hard to make recommendations. When dealing with "UTF-8" — as part of input, output or source code — I generally find these two lines, near the top of my code, handle most cases.

    use utf8; use open IO => qw{:encoding(utf8) :std};

    See the open and utf8 pragmata; the binmode function; and any of the perldoc pages with names starting with perluni (choose as appropriate for your Perl/Unicode knowledge: there's lots from introductory to advanced levels).

    In the example (one-liner) code I provide below, I've used this alias:

    $ alias perlu alias perlu='perl -Mstrict -Mwarnings -Mautodie=:all -Mutf8 -C -E'

    See perlrun for any options you're unfamiliar with.

    If you thought to check for '&times;', you should probably also check for other formats that they might appear in: '&#xd7;', '&#xD7;' and '&#215;'. In fact, you should probably check that the character really is "\N{MULTIPLICATION SIGN}", because a lot of characters, like this selection, look very similar to that:

    $ perlu 'my @x = qw{ x × ⨉ 🗙 }; say sprintf "U+%06X", ord for @x'
    U+000078
    U+0000D7
    U+002A09
    U+01F5D9
    

    Those are (using the builtin module Unicode::UCD):

    $ perlu 'use Unicode::UCD "charinfo"; my @x = qw{ x × ⨉ 🗙 }; say sprintf "U+%06X : %s", ord($_), charinfo(ord $_)->{name} for @x'
    U+000078 : LATIN SMALL LETTER X
    U+0000D7 : MULTIPLICATION SIGN
    U+002A09 : N-ARY TIMES OPERATOR
    U+01F5D9 : CANCELLATION X
    

    Anyway, when you have determined the character, both substitution (s///) and transliteration (y///) should work just fine. Transliteration is faster, if that matters to you (see "perlperf: BENCHMARKS: Search and replace or tr"). Here's some examples:

    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ s/\N{MULTIPLICATION SIGN}//r'
    |×|
    ||
    
    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ y/\N{MULTIPLICATION SIGN}//dr'
    |×|
    ||
    
    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ s/\x{d7}//r'
    |×|
    ||
    
    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ y/\x{d7}//dr'
    |×|
    ||
    
    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ s/×//r'
    |×|
    ||
    
    $ perlu 'my $x = "|\N{MULTIPLICATION SIGN}|"; say $x; say $x =~ y/×//dr'
    |×|
    ||
    

    [Note: Although we generally prefer code and data within <code>...</code> tags, when posting Unicode, <pre>...</pre> tags will show the actual characters (instead of entity references like &#x1f5d9;). The downside of using <pre> is that you have to manually format special characters ('<' to '&lt;', '&' to '&amp;', and so on) and you don't get a "Download" link.]

    Finally, I'm using Perl 5.26, which supports Unicode 9.0. If you have an earlier Perl version, it will support an earlier Unicode version, which may give you different results to the ones I've shown. I posted a discussion about this a couple of months ago: "Re: printing Unicode works for some characters but not all".

    — Ken

Re: replace multiplication symbol (&times;)
by Corion (Patriarch) on Aug 20, 2017 at 07:22 UTC

    Depending on what the encoding of your input and your script is, you cannot easily use a literal multiplication symbol without decoding your input.

    Personally, I would always use the named UTF-8 characters:

    s/\N{MULTIPLICATION SIGN}//g;

    Especially if you pull in user input from a HTML form, you should use Encode to decode the input from the character set that your user sends you, and encode the output again when you inline it into your HTML.

Re: replace multiplication symbol (&times;)
by Laurent_R (Canon) on Aug 20, 2017 at 08:34 UTC
    Perhaps having the pragma:
    use utf8;
    might be sufficient for the multiplication symbol to be correctly recognized within the regex of your code.

    But you might still have trouble with the encoding of your input, on which you say nothing.

Re: replace multiplication symbol (&times;)
by huck (Prior) on Aug 20, 2017 at 05:21 UTC

    The star("*") is a special character to a regexp. It means match 0 or more times. To remove the special significance you need to escape it to turn it into text.

    for (@input) {s/\×//g};
    Another trick is to make it part of a character class
    for (@input) {s/[×]//g};
    In this case the things inside the [] are each considered single characters (excpet for maybe a ^ or -) , where any one of them can match a character. In this case [*] is the character class consisting of only *

    Sorry, i did "misread to OP's multiplication symbol ("×") as an asterisk ("*")"., guess im getting old

      Sorry I'm confused, why do you identify the characters x and * ?

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      Update: The above was posted before huck ' s correction