Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I'm trying to parse the bodies of some automated emails. One would think it would be easy, but for some reason the generator of the emails does NOT cut lines using a \n, but by adding spaces. The number of spaces added seems to vary.

Initially I broke the email into an array by splitting on \s{15,}, but this isn't ideal, and drops some of the values. I'm considering splitting on the colons, but I'm not convinced this is a great idea and seems to lead to more headaches. Any ideas for a somewhat robust, straightforward way to parse this?

Security : BULGY N V- + + + + + + + + + + Item Overridden : Earnings Per Share + + + + + + + + + + Initial Value : (USD) + + + + + + + + + + Current Value : () + + + + + + + + + + Overridden Value : 160 (USD) + + + + + + + + + + Effective : 08/20/1999 through 08/20/2000 + + + + + + + + + + Override Type : Data SecurityID : 1076665 Sedol : 2451234 Cusip : N66696606 ISIN : NL0006122988
Update: Ah, I've found a potential way. while ($bdy=~/(.*?):\s(.*?)\s\s/g) seems to work alright. Comments on this approach?



-----------------
s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;

In reply to Parsing semi-erratic text by SamCG

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2024-04-18 06:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found