Hi!

Sorry if this isn't a pure perl question; there might be perl involved in automating the solution....

I need to screen scrape a flash webapp -- yep, I can't get access to HTML.

The webapp presents tables of data I'm authorized to view, but I'd like to put them data in spreadsheet so I can sort and plot the data.

Maybe I can use perl to drive the flash app thru IE (haven't tried, but probably) using samie, but the flash app doesn't offer any way to dump data out of the darned thing...

ok, this is indeed horrible, but if I had to page thru the data screens and save screen shots as jpeg images or something, is there any way to pull text out of a jpeg using OCR or something? Quite horrible, indeed -- screen scraping at the pixel level -- but these data are worth it.

Ack!

Suggestions / ideas / comments most welcome --

Thanks!

water

(PS The folks providing this app wont take the time to modify it in any way or talk to me at this point, so the obvious "ask for a clean data dump" doesn't work here.)

(PPS The other fallback is have an employee type in data from the screen -- that might take a few days of effort -- so there's a reasonable human fallback soln.)


In reply to Pixel scraping flash to extract text by water

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.