Hello all!

I'm trying to parse something *like* the following string...

joe(lots of spaces) 0.0000E(one space)000 (spaces) 9.0720E-001 (lots of spaces) d23(lots of spaces) 9.0208E-001(no space)

I would like to capture the following like so: "joe", "0.00E 000" "d23" "9.0720E-001". The reason I say *like* is because I have a script needs to parse strings like the above, but not always the same. These strings are generated, and I have a whole lot of them to parse :( I can't for the life of me figure out a regex to work...The strings have these characteristics:

1. Starts with some string, a name

2. Followed by a whole bunch of space (amount of spacing between each "group of characters" is irregular, but always more than just a single space)

3. Followed by an integer written in scientific form. They will either look like this: #.####E(space)### or this: #.####E-###. The integer is always a decimal integer taken to 4 decimal points ("decimal points"...is this the right term? I've been out of school for too long...)

4. More spacing

5. Followed by another scientific integer written in the same scientific form as described above

6. Followed by a whole bunch of spacing (again irregular, but always more than just a single space)

7. Followed by a text and/or integer "mashup word" (like d9s00 or e893 or 887.9 or irtw, etc) of unknown length

8. Followed again by a whole bunch spaces

9. Followed by another scientific integer

10. End of string. No spaces follow the last "word"

I've been trying all sorts of things all day and the closest thing I've managed was this: @words = split(/\s\s\S/, $some_string). This gets me the first three "words" ("joe", "0.00E 000", and "d23"), but not the last. I'm stumped, any help? Eternally grateful


In reply to How to extract these groups of characters? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.