in reply to Regular expression

A tokenizer is what you want. Here's an implementation:

my $line ='"Transcription factor" promoter DNA'; my $fixed; for ($line) { /\G ( \" [^"]+ \" ) /xgc && do { $fixed .= $1; redo }; /\G ( [^"\s]+ ) /xgc && do { $fixed .= $1; redo }; /\G \s+ /xgc && do { $fixed .= ','; redo }; last; } print("$fixed\n");