Intro

My JavaScript (and VB) friends have brought me a question that I am having trouble determining what the best regular expression would be...

They were discussing the need to parse through a string that contained sets of search parameters separated by spaces. Sounds easy enough... "Just split on the spaces in the string and put it into an array!" we'd say.

Of course, if this was the case, he'd be doing that right now. The problem is that to allow the user to have spaces in any of these search parameters, we must allow them to quote the words with the quote character (").

The original regular expressions we designed for this used a temporary array to hold the value in the escaped quoted strings, but someone suggested just escaping all instances of a slash (\) with (\\) (within the quoted strings), and then replacing all spaces with (\s) within these quotes strings... This sounds much better...

The problem is I can't think of a good regex to look through the string and replace/escape only those spaces within quote with (\s)... Once we do this, doing a split on spaces will give us the array with all desired atoms.

Sample String

SearchAtom1 "Quoted Search Atom2" SearchAtom3 "" SearchAtom5Since4Wa +sEmpty "Search Atom 6"

Assumptions

  • The quoted strings cannot themselves contain quote characters (i.e.. there will be an even number of quotes in the string).
  • The quoted strings could be empty, or contain just spaces.
  • Each atom is separated by one or more space characters
  • The string would already have any leading and trailing spaces removed
  • Any multiple spaces found within a quoted string would be untouched.

    Old JavaScript Solution

    This was the JavaScript version we originally wrote
    <SCRIPT LANGUAGE="JavaScript"> function ParseStringRegExp (strSource) { var intMatchCounter = 0; var aryStoredValues = new Array(); var strUniqueID; var strWorking = strSource; var aryQSMatch; // Iterate through each quoted string and replace with UniqueID while (aryQSMatch = strWorking.match (/"[^"]*"/)) { strUniqueID = "__" + intMatchCounter + "__"; // Generate th +e UniqueID strWorking = strWorking.replace (/"[^"]*"/, strUniqueID); // Re +place the value with UniqueID aryStoredValues[intMatchCounter++] = aryQSMatch[0]; // Store +removed value into array } // Split the modified string by spaces var aryOutput = strWorking.split (/\s+/); // Go through array and replace UniqueIDs with original values. for (i = 0; i < aryOutput.length; i++) { if (aryQSMatchResults = aryOutput[i].match(/__(\d+)__/)) { aryOutput[i] = aryStoredValues[aryQSMatchResults[1]]; // Do r +eplacement here } } return (aryOutput); } var strSource = 'SearchAtom1 "Quoted Search Atom2" SearchAtom3 "" Se +archAtom5Since4WasEmpty "Search Atom 6"'; var aryOutput = ParseStringRegExp (strSource); alert (aryOutput); </SCRIPT>

    Perl Solution Wanted

    Basically, the (simple) Perl regular expression(s) for this problem is all we care at this point, as it will eliminate the need for this temporary aryStoredValues and aryQSMatchResults. We will then take this Perl and convert it to JavaScript... This basically means writing a regex to escape all backslashes and spaces with an appropriate substitute within that quoted string... Here's the result function we wish to write in JavaScript, with the help of our Perl friends:
    <SCRIPT LANGUAGE="JavaScript"> function ParseStringRegExp (strSource) { var intMatchCounter = 0; var aryStoredValues = new Array(); var strUniqueID; var strWorking = strSource; var aryQSMatch; // Escape all backslashes and spaces. // *** // what's a good regex(es) we can write for this section? // *** // Split the modified string by spaces. var aryOutput = strWorking.split (/\s+/); // Go through array and replace UniqueIDs with original values. for (i = 0; i < aryOutput.length; i++) { // *** // Do the unescaping back to regular slashes and spaces for each // array value here for aryOutput[i]. // *** } return (aryOutput); } var strSource = 'SearchAtom1 "Quoted Search Atom2" SearchAtom3 "" Se +archAtom5Since4WasEmpty "Search Atom 6"'; var aryOutput = ParseStringRegExp (strSource); alert (aryOutput); </SCRIPT>
    I'm sure that some of you can think of a good regex for this, but we have to keep in mind that these regexs need to be converted to the 'lesser' languages of JavaScript (and also Visual Basic - PUKE - the VB implementation of regular expressions is SO LAME), so it must be simple, rather than pretty and obfuscated :)

    In reply to Regex for escaping spaces in strings when there are quotes by Incognito

    Title:
    Use:  <p> text here (a paragraph) </p>
    and:  <code> code here </code>
    to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.