I wrote a script to filter "uninteresting" commands (ls, cat, man) from my .bash_history, because I wanted them included in the current session's history but not persisted for future sessions (using Bash's HISTIGNORE variable would exclude them from both).

I've configured Bash to save multiline history entries with embedded newlines, and entries are separated by unix timestamps, like:

#1501293767 foo() { echo foo } #1501293785 ls

I wanted to remove the "uninteresting" single-line entries, but keep all multiline entries. I figure if a command was complex enough to warrant multiple lines, it's worth remembering. So, for example, this entry should be removed:

#1501293785 cat afile

whereas this (somewhat contrived) entry should be kept:

#1501293785 cat afile | while read -r line; do echo "line: " $line done

I implemented it as a finite-state machine using Awk, and was impressed with its performance. It processes a 50,000 line file in about 70 milliseconds. My .bash_history is unlikely to grow beyond 25,000 lines, so that's great, especially since I trigger this in the background when exiting the shell.

Nonetheless, I'm curious whether Perl might be a better tool for the job. The Awk code is not particularly elegant, and I've heard Perl is a performant scripting language. I've never written any though, so I wanted to check here and see if this seems like a good use-case for Perl.

I'm not necessarily asking how to translate this into Perl, though I'm open to doing so, but wondering if Perl offers other approaches to solving this problem.

A graph of the finite-state machine can be seen here: https://i.stack.imgur.com/fLG4K.png

For reference here's the Awk code:

BEGIN { timestamp = "" entryline = "" timestamp_regex = "^#[[:digit:]]{10}$" exclusion_regex = "^(ls?|man|cat)$" state = "begin" } { if (state == "begin") { if ($0 ~ timestamp_regex) { timestamp = $0 state = "readtimestamp" } else { print state = "printedline" } } else if (state == "printedline") { if ($0 ~ timestamp_regex) { timestamp = $0 state = "readtimestamp" } else { print state = "printedline" } } else if (state == "readtimestamp") { if ($0 ~ timestamp_regex && $0 >= timestamp) { timestamp = $0 state = "readtimestamp" } else if ($1 ~ exclusion_regex) { entryline = $0 state = "readentryline" } else { print timestamp print state = "printedline" } } else if (state == "readentryline") { if ($0 ~ timestamp_regex) { entryline = "" timestamp = $0 state = "readtimestamp" } else { print timestamp print entryline print state = "printedline" } } }

In reply to Filtering certain multi-line patterns from a file by ivanbrennan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.