Dear monks

I am trying to filter out archaic entries from a dictionary. They are marked with "(arch)". This is my best regexp so far (sorry, I never quite got used to using /x):

s#(?<=;)(?:\(\S+\) )*\(\d+\) (?:\(\S+\) )*\(arch\).*?;($| \(\d+\))#$1#g

And here is a selection of lines I am trying to filter:

(n,vs) (1) look; glimpse; glance; (vs) (2) to glance; to glimpse; (3) +(arch) first meeting; (adv) (4) apparently; seemingly; (n-t,n-adv) (1) moment; a (short) time; a while; (2) former times; (3) + (arch) two-hour period; (v5s,vt) (1) to pass (time); to spend; (2) to overdo (esp. of one's al +cohol consumption); to drink (alcohol); (3) (arch) to take care of; t +o support; (suf,v5s) (4) to overdo; to do too much; (5) to ... withou +t acting on it; (pn,adj-no) (1) we; us; (2) (arch) I; me; (3) (arch) you (referring to + a group of one's equals or inferiors); (n) (1) eye; eyeball; (2) (arch) pupil and (dark) iris of the eye; (3) + (arch) insight; perceptivity; power of observation; (4) (arch) look; + field of vision; (5) (arch) core; center; centre; essence; (v5m,vt) (1) to step on; to tread on; (2) to experience; to undergo; ( +3) to estimate; to value; to appraise; (4) to rhyme; (5) (arch) to in +herit (the throne, etc.); (6) to follow (rules, morals, principles, e +tc.); (v5s,vt) (1) to build up; to establish; (2) to form; to become (a stat +e); (3) to accomplish; to achieve; to succeed in; (4) to change into; + (5) to do; to perform; (aux-v) (6) (arch) to intend to; to attempt; +to try; (7) (arch) to have a child; (adv) (1) (uk) that is to say; that is; in other words; I mean; (2) (u +k) in short; in brief; to sum up; ultimately; in the end; in the long + run; when all is said and done; what it all comes down to; when you +get right down to it; (n) (3) (uk) clogging; obstruction; stuffing; ( +degree of) blockage; (4) (uk) shrinkage; (5) (uk) end; conclusion; (6 +) (uk) (arch) dead end; corner; (7) (uk) (arch) distress; being at th +e end of one's rope; (n,adj-no) (1) inside; within; (2) while; (3) among; amongst; between; + (pn,adj-no) (4) we (referring to one's in-group, i.e. company, etc.) +; our; (5) my spouse; (n) (6) (arch) imperial palace grounds; (v5r,vi) (1) to rot; to go bad; to decay; to spoil; to fester; to deco +mpose; to turn sour (e.g. milk); (2) to corrode; to weather; to crumb +le; (3) to become useless; to blunt; to weaken (from lack of practice +); (4) to become depraved; to be degenerate; to be morally bankrupt; +to be corrupt; (5) to be depressed; to be dispirited; to feel discour +aged; to feel down; (suf,v5r) (6) (uk) (ksb:) indicates scorn or disd +ain for another's action; (v5r,vi) (7) (arch) to lose a bet; (8) (arc +h) to be drenched; to become sopping wet; (v5s,vt) (1) to build up; to establish; (2) to form; to become (a stat +e); (3) to accomplish; to achieve; to succeed in; (4) to change into; + (5) to do; to perform; (aux-v) (6) (arch) to intend to; to attempt; +to try; (7) (arch) to have a child;

The format seems to be (part-of-speech) (number) (tags) colon-separated definitions, where tags contains the arch tag, and part-of-speech is repeated only once.


In reply to Removing with regexps by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.