From The Weekly Challenge 329.1: You are given a string containing only lower case English letters and digits. Write a script to replace every non-digit character with a space and then return all the distinct integers left. The replacement of non-digit characters with spaces seemed kind of pointless since you can extract the integers without doing that. But I challenged myself to make the action of replacing with spaces meaningful. For amusement purposes what I did was this: After converting letters to spaces, I wrote the string to a PNG file and OCR'd it using a hosted OCR service, via OCR::OcrSpace. That service returns a json file with each "word" along with it's own bounding box information and other data, I ignore everything except the extracted integer "words", push them into an array and done!
use GD; use JSON; use OCR::OcrSpace; sub write_image{ my($s) = @_; my $width = 500; my $height = 500; my $image_file = q#/tmp/output_image.png#; my $image = GD::Image->new($width, $height); my $white = $image->colorAllocate(255, 255, 255); my $black = $image->colorAllocate(0, 0, 0); $image->filledRectangle(0, 0, $width - 1, $height - 1, $white); my $font_path = q#/System/Library/Fonts/Courier.ttc#; my $font_size = 14; $image->stringFT($black, $font_path, $font_size, 0, 10, 50, $s); open TEMP, q/>/, qq/$image_file/; binmode TEMP; print TEMP $image->png; close TEMP; return $image_file; } sub counter_integers{ my($s) = @_; my @numbers; $s =~ tr/a-z/ /; my $image = write_image($s); my $ocrspace = OCR::OcrSpace->new(); my $ocrspace_parameters = { file => qq/$image/, apikey => q/XXXXXXX/, filetype => q/PNG/, scale => q/True/, isOverlayRequired => q/True/, OCREngine => 2}; my $result = $ocrspace->get_result($ocrspace_parameters); $result = decode_json($result); my $lines = $result->{ParsedResults}[0] ->{TextOverlay} ->{Lines}; for my $line (@{$lines}){ for my $word (@{$line->{Words}}){ push @numbers, $word->{WordText}; } } return join q/, /, @numbers; } MAIN:{ print counter_integers q/the1weekly2challenge2/; print qq/\n/; print counter_integers q/go21od1lu5c7k/; print qq/\n/; print counter_integers q/4p3e2r1l/; print qq/\n/; }
(A longer blog on this is here.)
<jc> Why do people persist in asking me stupid questions?
<Petruchio> <insert mutually recursive response>
--an exchange from #perlmonks on irc.slashnet.org(2 March 2009 1345 EST)

In reply to OCRing out the Digits by adamcrussell

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.