in reply to Re: Regular Expressions on Unicode
in thread Regular Expressions on Unicode

Took your advice and put those two lines at the top of the code I provided, but frustratingly it didn't work. I just began learning Perl about a month ago, so I apologize for sounding dense. To clarify, does it look like I did as you suggested? I am also looking through the website you linked, hopefully something more will turn up there. My code now looks like:
#!/usr/bin/perl #I want to take a file of text as input, split it into an array of wor +ds #then search through the array for a word that matches the regular #expression, printing all matches. binmode STDIN, ":decoding(UTF-8)"; binmode STDOUT, ":decoding (UTF-8)"; use utf8; use charnames ':full'; while ($line=<>){ @array = split(/ /, $line); foreach $x (@array){ if ($x=~ /\x{02c0}/){#glottal stop print "$x\n"; } } }
Thanks so much for your help!

Replies are listed 'Best First'.
Re^3: Regular Expressions on Unicode
by moritz (Cardinal) on Dec 14, 2009 at 19:19 UTC
    That's roughly how I would have done it, except that :decoding(UTF-8) is wrong, it's still :encoding(UTF-8).

    Here is a working example how to search for that character:

    use strict; use warnings; use charnames qw(:full); binmode STDOUT, ':encoding(UTF-8)'; my $filename = 'test.txt'; if (@ARGV) { open my $handle, '>:encoding(UTF-8)', $filename or die "Can't write to file '$filename': $!"; print $handle <<"OUT"; The next line contains a\N{MODIFIER LETTER GLOTTAL STOP} Really! OUT close $handle or warn $!; } else { open my $handle, '<:encoding(UTF-8)', $filename or die "Can't open file '$filename' for reading: $!"; for (<$handle>) { print if /\N{MODIFIER LETTER GLOTTAL STOP}/; } close $handle; }

    When you call it with command line arguments it writes a test file, when called without any that test file is read again:

    $ perl sample.pl gen
    $ perl sample.pl 
    contains aˀ
    

    I hope this help, you can gradually morph it into the program you want, when you change something and it breaks you know what's wrong.

Re^3: Regular Expressions on Unicode
by Anonymous Monk on Dec 14, 2009 at 15:09 UTC
    Why did you change it to decoding?
    use open IO => ":encoding(UTF-8)";
      IO doesn't include STDOUT/STDIN
      perl -Mopen=IO,:encoding(UTF-8) -le"print join q! !, $$_[0], PerlIO::g +et_layers($$_[0], output => $$_[1]) for [*STDOUT,1], [*STDIN,0], [*ST +DERR,1] " *main::STDOUT unix crlf *main::STDIN unix crlf *main::STDERR unix crlf
      You want use open qw! :std :encoding(UTF-8) !; ex:
      perl -Mopen=:std,:encoding(UTF-8) -le"print join q! !, $$_[0], PerlIO: +:get_layers($$_[0], output => $$_[1]) for [*STDOUT,1], [*STDIN,0], [* +STDERR,1] " *main::STDOUT unix crlf encoding(utf-8-strict) utf8 *main::STDIN unix crlf encoding(utf-8-strict) utf8 *main::STDERR unix crlf encoding(utf-8-strict) utf8