Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: scalable chomping

by ccn (Vicar)
on Oct 29, 2008 at 14:02 UTC ( [id://720229]=note: print w/replies, xml ) Need Help??


in reply to scalable chomping

Read the file line by line using $/ = '+'.

    Apply regexps on each line to remove \n and replace record separators

This is onliner as example

perl -l -0x2B -pe 's/\n//g;s/[XYZ]/;/g' corruptedfile > recoveredfile
where 'X', 'Y', 'Z' are characters to be replaced with record separator ';'

Replies are listed 'Best First'.
Re^2: scalable chomping
by TGI (Parson) on Oct 29, 2008 at 17:14 UTC

    If X Y and Z can legitimately be in the file you are going to have to do more work. Keep track of values that you have "fixed" substitutions in, and what the original character was. You will then have a list of 'known suspect values' as well as a way to get the original value.

    The best approach (short of retrieval from a backup) would be to do as much parsing and sanity checking on the data as you process the file. Trivial/Obvious fixes can be automated, but anything questionable needs to be flagged and ask for human intervention.

    Good luck. I think you'll need it :/.


    TGI says moo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://720229]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2024-04-24 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found