From The Weekly Challenge 329.1: You are given a string containing only lower case English letters and digits. Write a script to replace every non-digit character with a space and then return all the distinct integers left. The replacement of non-digit characters with spaces seemed kind of pointless since you can extract the integers without doing that. But I challenged myself to make the action of replacing with spaces meaningful. For amusement purposes what I did was this: After converting letters to spaces, I wrote the string to a PNG file and OCR'd it using a hosted OCR service, via OCR::OcrSpace. That service returns a json file with each "word" along with it's own bounding box information and other data, I ignore everything except the extracted integer "words", push them into an array and done!
use GD; use JSON; use OCR::OcrSpace; sub write_image{ my($s) = @_; my $width = 500; my $height = 500; my $image_file = q#/tmp/output_image.png#; my $image = GD::Image->new($width, $height); my $white = $image->colorAllocate(255, 255, 255); my $black = $image->colorAllocate(0, 0, 0); $image->filledRectangle(0, 0, $width - 1, $height - 1, $white); my $font_path = q#/System/Library/Fonts/Courier.ttc#; my $font_size = 14; $image->stringFT($black, $font_path, $font_size, 0, 10, 50, $s); open TEMP, q/>/, qq/$image_file/; binmode TEMP; print TEMP $image->png; close TEMP; return $image_file; } sub counter_integers{ my($s) = @_; my @numbers; $s =~ tr/a-z/ /; my $image = write_image($s); my $ocrspace = OCR::OcrSpace->new(); my $ocrspace_parameters = { file => qq/$image/, apikey => q/XXXXXXX/, filetype => q/PNG/, scale => q/True/, isOverlayRequired => q/True/, OCREngine => 2}; my $result = $ocrspace->get_result($ocrspace_parameters); $result = decode_json($result); my $lines = $result->{ParsedResults}[0] ->{TextOverlay} ->{Lines}; for my $line (@{$lines}){ for my $word (@{$line->{Words}}){ push @numbers, $word->{WordText}; } } return join q/, /, @numbers; } MAIN:{ print counter_integers q/the1weekly2challenge2/; print qq/\n/; print counter_integers q/go21od1lu5c7k/; print qq/\n/; print counter_integers q/4p3e2r1l/; print qq/\n/; }
(A longer blog on this is here.)
<jc> Why do people persist in asking me stupid questions?
<Petruchio> <insert mutually recursive response>
--an exchange from #perlmonks on irc.slashnet.org(2 March 2009 1345 EST)

Replies are listed 'Best First'.
Re: OCRing out the Digits
by ysth (Canon) on Jul 17, 2025 at 21:04 UTC
    I was also bothered by the needless replace step.

    Using that as a rationale for OCR is awesome.
Re: OCRing out the Digits
by Anonymous Monk on Jul 17, 2025 at 22:12 UTC

    (a nitpick: the challenge asks for "distinct integers": your output for the 1st test is wrong)

    It's either you are very lucky, or I'm negative as usual. Just wanted to check at 1st, if the engine applies any heuristics when all glyphs are obviously spaced apart a little (e.g. receipts to be easily readable by the elderly or similar.) I only did a dozen tests with manual image uploading, didn't apply for a free pass key.

    use strict; use warnings; use GD; my $width = 1000; my $height = 100; my $image_file = q#output_image.png#; my $image = GD::Image->new($width, $height); my $white = $image->colorAllocate(255, 255, 255); my $black = $image->colorAllocate(0, 0, 0); $image->filledRectangle(0, 0, $width - 1, $height - 1, $white); my $font_path = q#c:/windows/fonts/cour.ttf#; my $font_size = 14; $image->stringFT($black, $font_path, $font_size, 0, 20, 75, '2 5 5 2 5 5 2 5 5'); open TEMP, q/>/, qq/$image_file/; binmode TEMP; print TEMP $image->png; close TEMP;

    >>>

    ****** Result for Image/Page 1 ****** 255 255 255

    Obviously not applicable for PWC 329.1. If, as I'd do anyway, the "s" transliteration modifier had been used i.e. string becomes "2 5 5 2 5 5 2 5 5":

    ****** Result for Image/Page 1 ****** 25525 52 5 5

    And another Mars Orbiter kaput. Actually, there are different (wrong) results, i.e. digits groupings, for canvas size and string position as in your example, i.e. it's a factor, too. The reason I changed to larger canvas is because I tried larger font size, at some moment. Then some glyphs are just ignored, up to the point when NO text is found in the image (string as above, 500x500 canvas, 10x50 position, 16 points font, OCR2 engine, auto-enlarge on). Now, YOU ARE LUCKY to get correct results from them.

Re: OCRing out the Digits
by smile4me (Beadle) on Aug 12, 2025 at 20:56 UTC

    I too was concerned with the "distinct integers" requirement. While not as clever as the OCR approach, I appreciate a good one-liner from time to time:

    $ echo '2 Here 4 is 4 the week 329.1 test 2 as 1 perl 3 onliner. 1' |\ perl -anE '$s = "@F"; $s =~ tr/[0-9]/ /c; $s =~ s/ +/ /gm; \ @n = do {%h; @h{ split(/ /, $s) } = (); sort {$a <=> $b} keys %h; }; \ say "result: @n"; ' result: 1 2 3 4 329

    Smile!

A reply falls below the community's threshold of quality. You may see it by logging in.