sometimes the args are one word, sometimes 2 words, sometimes 5 words.
Maybe my answer wasn't clear. The central issue to resolve is the syntactic frame (that is, the arrangement of argument "slots") for your command lines. The kinds of frames that can support a mixture of both single-word and multi-word args are ones that have at most one multi-word arg "slot", with an invariant number of mandatory single-word arg "slots" before and/or after the mult-word slot -- e.g. you could have a set of distinct frames for various classes of commands (I'll show "slots" using curly braces):
{command1} {file [file2 ...]} {dest_path}
{command2} {file_name} {dest [dest2 ...]}
{command3} {url} {file} {word [word2 ...]} {email@address}
In these examples, it doesn't matter how many words occur in the slots that have optional extra tokens (the ones with "..."), because the number of slots for each command type is invariant, and when you take away the tokens for slots that are known to contain single-word args, what you have left are the (one or more) words that make up the variable-length arg.
If certain slots can be expected to always match distinctive regex patterns (like urls and email addresses), you could support somewhat more complex (flexible) frames by using more elaborate regex matches, as suggested in the first reply (e.g. if the second slot is for a url, but the second token doesn't look like a url, you can treat the url slot as "empty" and know that the second token goes with the third slot).
But this doesn't work if two or more slots can contain multiple word tokens (e.g. if there are two file name slots, and file names can contain spaces).
The only way this is known is with the @TAB_COMPLETION array. ... So when I receive the scalar from readline, there is no way to know what are multi-word args, and what are a bunch of single word args.
Let's suppose that the syntactic frame idea doesn't work for you, because some commands may have two or more slots that accept multi-word args.
You know the set of initial command words that are allowed, don't you? So start by separating off the command word. And are you saying that you have access to an array that holds the various multi-word args that are possible via tab completion?
If so, you could sort that array by string length (longest string first), and check your input string against the array to find the longest exact match:
# assume that $_ is input line with command word removed,
# and that @tab_completions is sorted by string length:
my $i = 0;
while ( $i < @tab_completions ) {
last if ( index( $_, $tab_completions[$i] ) == 0 );
$i++;
}
if ( $i < @tab_completions ) {
# $i is the element that matched, so pull that string
# off the command line, and proceed with handling other
# args, if any
}
else {
# user entered a typo, perhaps?
}
This is grasping at straws -- I don't know much about ReadLine::GNU or the tab_completion array stuff. If all else fails, it's conceivable that this module could be "altered" slightly to put quotes around args (or put slash escapes in front of special characters like whitespace) as part of its tab-completion logic; in fact, maybe it can do this already, if you use the right parameters... If the docs are no help, you could try looking at the module source code. |