comment on

Well, if you are in fact dealing with utf8-encoded "xSV" files, then you shouldn't worry about applying a BOM-removal filter on all lines. Just in case there might be any more BOMs scattered throughout the file, you'll want to remove them all, because they should not be treated as if they were part of the actual table data.

In other words, it is OKAY to have a pre-processing filter that removes BOM characters from every line -- something like s/\x{feff}//g as a filter is perfectly sane.

Note that there are some situations that really can create a text file with a BOM at the start of every line (I've seen it happen), so having logic that applies that filter to every line might just save you from some real trouble. (And of course, on lines that don't have a BOM, such a filter doesn't do anything at all, so it's quite harmless.)

The particular unicode character called "BOM" serves no other purpose than to be the byte-order-mark -- at least, that is the intent it was chosen for; it simply gets in the way and makes trouble if you happen to treat it as if it were data, and it is of course logically useless in a utf8 file anyway (even though some MS-Windows apps insert it routinely when creating utf8 text files -- and, heaven help us, Redmond or MS-centric tool developers may start using file-initial BOM as a kind of "signature" or "magic number" that they "need" to use for identifying files as being utf8 text).*

As for having Text:xSV do anything special with just the first line of a file, it already has logic to treat the first line as containing the "column headings", as opposed to containing actual data. If you need anything special beyond that for just the first line, you'd need to be a little more specific about your intended usage and the nature of what you are trying to accomplish -- e.g. what you've tried, how it failed, etc.

(* update/footnote: Now that I think of it, Notepad, which is one of those apps that automatically puts BOM at the beginning of every plain-text utf8 file it creates, appears to be already depending on the BOM as a "magic number" for identifying utf8 text files -- if you use perl to create a utf8 file with wide characters but without an initial BOM, then open that file in Notepad, it's likely not to display the wide-character text correctly.)

In reply to Re: Text::xSV -- how to filter only first line? by graff
in thread Text::xSV -- how to filter only first line? by blahblahblah

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.