I'm trying to parse the bodies of some automated emails. One would think it would be easy, but for some reason the generator of the emails does NOT cut lines using a \n, but by adding spaces. The number of spaces added seems to vary. Initially I broke the email into an array by splitting on \s{15,}, but this isn't ideal, and drops some of the values. I'm considering splitting on the colons, but I'm not convinced this is a great idea and seems to lead to more headaches. Any ideas for a somewhat robust, straightforward way to parse this?
Security : BULGY N V-
+
+
+
+
+
+
+
+
+
+
Item Overridden : Earnings Per Share
+
+
+
+
+
+
+
+
+
+
Initial Value : (USD)
+
+
+
+
+
+
+
+
+
+
Current Value : ()
+
+
+
+
+
+
+
+
+
+
Overridden Value : 160 (USD)
+
+
+
+
+
+
+
+
+
+
Effective : 08/20/1999 through 08/20/2000
+
+
+
+
+
+
+
+
+
+
Override Type : Data
SecurityID : 1076665
Sedol : 2451234
Cusip : N66696606
ISIN : NL0006122988
Update: Ah, I've found a potential way. while ($bdy=~/(.*?):\s(.*?)\s\s/g) seems to work alright. Comments on this approach?
-----------------
s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|