I have a medium sized file of about 6,000 records. They are diagnosis codes for emergency room patient visits, but that's beside the point. Consider them a string of numbers arranged like:
396.04,305.1,894.2,V321,908.3,,,,,
299.5,785,432.1,305.1,,,,,,
112.3,312,685,422.1,V566,433.2,987.5,,,, etc.
There can be as few as 2 numbers in each record or as many as 10. I need to filter out a certain subset of records. In each set (row) of numbers there is at least one number between 296.1 and 314.0. However, if the row contains "305.1", the record can be excluded if "305.1" is the only number between 296.1 and 314.0 in the row. In the sample rows above, row 1 would be discarded and rows 2 and 3 kept.
The person requesting this data suggested that I dump it into an excel spreadsheet, sort the data and manually remove the records that I don't need. That seems way too labor intensive to me, and I would think that Perl would have some quick and easy way to sort this out. I can't quite get my mind around the best way to do it, however.
I was thinking that I'd read each row and use a regular expression to find rows with 305.1 and then check for the existence of another qualifying number. Based on that I could then either delete the rows I don't need or save the ones I want to a new file. I'm a little rusty with Perl right now, and I don't even know how to start. I thought if I organized it into a node and tossed it out here that some discussion might help get me going. I'd appreciate any suggestions. Thanks.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.