ttlgreen has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm trying to find the function(?) that takes some sort of string and prints every character (even things like \n).

I stumbled across it a long time ago but can't remember what it's called.

an example:

If I were to give this function a variable/string such as:

$var = "Some random text.\nMore\nEven More\n";

It would output something like the following to stdout:

S o m e   r a n d o m   t e x t . \n M o r e \n E v e n   M o r e \n

The purpose here is I am collecting some "formatted" output from a program and then trying to make a regex to sift through the data, however I can't make the regex work because of all the "invisible" characters in the formatted output from the program. So I want to dump the data collected in some way like the above example so that I can adjust my regex to consider the invisible characters.

I hope that makes sense :)

Thanks in advance for any help!

Replies are listed 'Best First'.
Re: What is the function that does the following..?
by almut (Canon) on Apr 15, 2010 at 22:12 UTC

    perluniintro shows a function nice_string() which essentially does what you want (i.e. render special characters as \x... escapes), and can easily be adapted to your likings.  For example, here in slightly modified form:

    #!/usr/local/bin/perl -l sub nice_string { join(" ", map { $_ > 255 ? # if wide character... sprintf("\\x{%04X}", $_) : # \x{...} chr($_) =~ /[[:cntrl:]]/ ? # else if control character .. +. sprintf("\\x%02X", $_) : # \x.. chr($_) # else as themselves } unpack("U*", $_[0])); # unpack Unicode characters } my $var = "Some random text.\nMore\nEven More\n"; print nice_string($var); __END__ S o m e r a n d o m t e x t . \x0A M o r e \x0A E v e n M o r e +\x0A

    (It expects a decoded character string.)

Re: What is the function that does the following..?
by ikegami (Patriarch) on Apr 15, 2010 at 19:45 UTC
    You are (most likely) working with UTF-16le text. Decode it. A convenient way for files:
    # With LF<->CRLF conversion open(my $fh, '<:raw:perlio:encoding(UTF-16le):crlf', $fn)
    # Without LF<->CRLF conversion open(my $fh, '<:raw:perlio:encoding(UTF-16le)', $fn)
Re: What is the function that does the following..?
by youlose (Scribe) on Apr 15, 2010 at 20:34 UTC
    I think your regexp doesn't work with such strings, because you doesn't use any 'm' or 's' modifiers. If you wanna process string char by char, you can 'split' that string to array and work with this array.
Re: What is the function that does the following..?
by ssandv (Hermit) on Apr 15, 2010 at 21:02 UTC
    If you're on Linux, the od utility can be quite handy for such things.
      Just out of curiosity, I tried the Linux od utility and the ppt od utility. I created a file called "text":
      Some random text.\nMore\nEven More\n
      Then for the Linux utility:
      od -N 36 -c /root/Desktop/text
      Then for the ppt version of od:
      /usr/local/ppt/bin/od -c /root/Desktop/text
      The Linux od let me control the number of bytes that were dumped; however, with the ppt od, there is no -N option.

      Thanks for all the help everyone! With this od utility and a better regex I was able to cobble together a solution. Sorry I only have 4 votes to give ;)

Re: What is the function that does the following..?
by cdarke (Prior) on Apr 16, 2010 at 09:14 UTC
    The following is based on an od.pl script I did some time ago. It does not handle Unicode characters, but I'm not sure you need that:
    use warnings; use strict; my $var = "Some random text.\nMore\nEven More\n"; print process_text($var),"\n"; sub process_text { my $input = shift; # Get a space between each char my @text = split //,$input; local $" = ' '; my $text = "@text"; # Table of special characters my %trans = ("\n" => '\n', "\t" => '\t', "\b" => '\b', "\r" => '\r', "\f" => '\f'); my $pattern = join ('|', keys %trans); # substitute special characters $text =~ s/($pattern)/$trans{$1}/g; return $text; }
    Produces:
    S o m e r a n d o m t e x t . \n M o r e \n E v e n M o r e \n
Re: What is the function that does the following..?
by Marshall (Canon) on Apr 16, 2010 at 05:09 UTC
    I am not sure exactly what your end goal is.

    In Perl regex'es the \s stands for any whitespace character (space,\n,\r,\f,\t). \s* would mean zero or more of these and \s+ would mean at least one of these and possibly more of them. \s is an easy "short hand" for any of these "non-printable" space characters.

    It could very well be that as other posters have suggested that the problem lies with 16 bit vs 8 bit chars. But then again, perhaps not if the issue is just "blank space" characters vs some other character.

    Consider the following code. Common function names are: isprint() or isprintable() for a yes/no value. Below my sub get_printable() always returns something.

    #!/usr/bin/perl -w use strict; my $var = "Some random text.\n\tMore\n\tEven More.\n\r\t And yet More +."; print "VAR is:$var\n"; print "Output of loop:\n"; foreach my $char (split//,$var) { print get_printable($char); } print "\n"; sub get_printable { my $char = shift; return $char if ($char =~ tr/A-Za-z0-9. //); return sprintf ("[hex%.2X]",ord($char)); } __END__ Prints: VAR is:Some random text. More Even More. And yet More. Output of loop: Some random text.[hex0A][hex09]More[hex0A][hex09]Even More.[hex0A][hex +0D][hex09] And yet More.
Re: What is the function that does the following..?
by graff (Chancellor) on Apr 16, 2010 at 17:45 UTC
    This might not be exactly what you want, but it might give you some ideas on how to do what you want: tlu -- TransLiterate Unicode (pay special attention to the "-o uf" option).