in reply to How to detect non printable characters and non white space characters? [RESOLVED]

G'day thanos1983,

I see your condition issue has been resolved. I'm currently working on a private project in a similar area (in my case, it is intended to handle any Unicode® character). This uses the \p{} construct to determine character type. You may be interested in this approach.

I've posted test code below. Be aware that this is a work-in-progress: the code I've posted works fine but is incomplete. There's more detailed Notes after the code.

#!/usr/bin/env perl use 5.025009; use strict; use warnings; use utf8; use open IO => qw{:encoding(utf8) :std}; use charnames ':full'; BEGIN { my @types = qw{CONTROL PRINT COMBINE UNKNOWN}; eval 'use enum @types'; sub type_name { $types[$_[0]] } } use Test::More; my @tests = ( [ '0000' => CONTROL ], [ '0009' => CONTROL ], [ '000a' => CONTROL ], [ '0020' => PRINT ], [ '0021' => PRINT ], [ '0030' => PRINT ], [ '0040' => PRINT ], [ '0041' => PRINT ], [ '0060' => PRINT ], [ '0061' => PRINT ], [ '007e' => PRINT ], [ '007f' => CONTROL ], [ '00a0' => PRINT ], [ '0300' => COMBINE ], [ '034f' => COMBINE ], [ '2000' => PRINT ], [ '200d' => CONTROL ], [ '2028' => CONTROL ], [ '2029' => CONTROL ], [ 'fe00' => CONTROL ], [ '1f3fb' => PRINT ], [ 'e0100' => CONTROL ], [ '10ffff' => CONTROL ], ); plan tests => scalar @tests; for my $test (@tests) { my ($input, $exp) = $test->@*; is(type_of($input), $exp, "Checking '@{[sprintf q{%04X}, hex $input]}'" . " is of type '@{[type_name($exp)]}'" . " (Got: '@{[type_name(type_of($input))]}')." ); } sub type_of { my ($input) = @_; my $char = chr hex $input; return CONTROL if $char =~ / [ \p{C} \p{Zl} \p{Zp} \p{VS} ] /xx; return PRINT if $char =~ / [ \p{L} \p{N} \p{P} \p{S} \p{Zs} ] / +xx; return COMBINE if $char =~ / [ \p{M} ] /xx; return UNKNOWN; }

Notes:

All tests pass. Here's an extract of the output:

1..23 ok 1 - Checking '0000' is of type 'CONTROL' (Got: 'CONTROL'). ... ok 23 - Checking '10FFFF' is of type 'CONTROL' (Got: 'CONTROL').

See also Unicode::UCD. I found this useful for checking property values of individual characters.

— Ken

  • Comment on Re: How to detect non printable characters and non white space characters? [\p{} character classes]
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: How to detect non printable characters and non white space characters? [\p{} character classes]
by thanos1983 (Parson) on Feb 24, 2017 at 16:16 UTC

    Hello kcott,

    Sorry for the late reply, I just saw your reply. This looks a great idea (although work in progress as you said).

    The only problem is that I do not have on the test bed that late version of Perl. v5.22.1 but I think even some nodes are running lower versions :P due to old OS releases.

    But never the less thanks again for your time and effort reading and replying to my question.

    Seeking for Perl wisdom...on the process of learning...not there...yet!