comment on

I wrote a script to filter "uninteresting" commands (ls, cat, man) from my .bash_history, because I wanted them included in the current session's history but not persisted for future sessions (using Bash's HISTIGNORE variable would exclude them from both).

I've configured Bash to save multiline history entries with embedded newlines, and entries are separated by unix timestamps, like:

#1501293767
foo() {
echo foo
}
#1501293785
ls
[download]

I wanted to remove the "uninteresting" single-line entries, but keep all multiline entries. I figure if a command was complex enough to warrant multiple lines, it's worth remembering. So, for example, this entry should be removed:

#1501293785
cat afile
[download]

whereas this (somewhat contrived) entry should be kept:

#1501293785
cat afile | while read -r line; do
  echo "line: " $line
done
[download]

I implemented it as a finite-state machine using Awk, and was impressed with its performance. It processes a 50,000 line file in about 70 milliseconds. My .bash_history is unlikely to grow beyond 25,000 lines, so that's great, especially since I trigger this in the background when exiting the shell.

Nonetheless, I'm curious whether Perl might be a better tool for the job. The Awk code is not particularly elegant, and I've heard Perl is a performant scripting language. I've never written any though, so I wanted to check here and see if this seems like a good use-case for Perl.

I'm not necessarily asking how to translate this into Perl, though I'm open to doing so, but wondering if Perl offers other approaches to solving this problem.

A graph of the finite-state machine can be seen here: https://i.stack.imgur.com/fLG4K.png

For reference here's the Awk code:

BEGIN {
  timestamp = ""
  entryline = ""
  timestamp_regex = "^#[[:digit:]]{10}$"
  exclusion_regex = "^(ls?|man|cat)$"
  state = "begin"
}
{
  if (state == "begin")
  {
    if ($0 ~ timestamp_regex)
    {
      timestamp = $0
      state = "readtimestamp"
    }
    else
    {
      print
      state = "printedline"
    }
  }
  else if (state == "printedline")
  {
    if ($0 ~ timestamp_regex)
    {
      timestamp = $0
      state = "readtimestamp"
    }
    else
    {
      print
      state = "printedline"
    }
  }
  else if (state == "readtimestamp")
  {
    if ($0 ~ timestamp_regex && $0 >= timestamp)
    {
      timestamp = $0
      state = "readtimestamp"
    }
    else if ($1 ~ exclusion_regex)
    {
      entryline = $0
      state = "readentryline"
    }
    else
    {
      print timestamp
      print
      state = "printedline"
    }
  }
  else if (state == "readentryline")
  {
    if ($0 ~ timestamp_regex)
    {
      entryline = ""
      timestamp = $0
      state = "readtimestamp"
    }
    else
    {
      print timestamp
      print entryline
      print
      state = "printedline"
    }
  }
}
[download]

In reply to Filtering certain multi-line patterns from a file by ivanbrennan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.