Just another rant ...

Yesterday, I improved a small helper script that reads an SVG file and converts that to code fragments for drawing lines and texts. (The target does not support displaying and manipulating SVG images the way I need it.) The script was (and still is) too stupid for all SVG features, because it only needs to process one image, supplied by our client.

The first image was years old, all coordinates were absolute. The image was changed, using a different version of the image editor, and suddenly, coordinates were relative. So I had to change the script. Wash, rinse, repeat. Yesterday, the script (and I) learned about horizontal and vertical line shortcuts, and shorcuts for cubic bezier curves. And about the completely INSANE way of representing paths in SVG.

SVG is structured. It's XML after all, so you can group elements, give each element an ID, apply CSS. Great. And all the code you need for that is already implemented. It's just XML.

You want a rectangle, green with a red border?

<rect x="10" y="20" width="42" height="8" stroke="#ff0000" stroke-widt +h="2" fill="#00ff00" />

A circle is as easy:

<circle cx="10" cy="20" r="8" stroke="#ff0000" stroke-width="2" fill=" +#00ff00" />

The same principle also applies for text, elipse, polygon and more.

And then, there is something similar to Logo turtle graphics a.k.a. paths. There is an implicit current point, which is the starting point for the next command, and there are commands: Move to, line to, curve to, close path.

How would you expect to see that in an XML-based vector image format? Like this:

<path x="50" y="50" stroke="#ff0000" stroke-width="2" fill="#00ff00"> <lineto x="75" y="75" /> <lineto x="75" y="25" /> <closepath /> </path>

XML, like all other elements, testable, parseable by any XML parser?

Dream on! It looks like this:

<path d="M396.73,645.98c-49.05,43.57-88.42,71.92-118.1,85.05-1.85.43-1 +7.07,11.54-45.26-.97-9.7-2.33-78.64-37.77-123.94-89.59,0,0-39.21-47.6 +6-50.48-134.8-7.51-58.09-13.85-106.3-19-144.64-14.55-30.82-24.76-38.5 +3-30.62-23.11-3.7,22.95,16.29,117.5,21.48,128.86"/>

No, there was no cat running over the num pad. That how SVG represents paths. In a f*ing string looking more like a historic modem command than anything structured.

Yes, there is structure. Upper and lower case letters followed by zero or more numbers. The letters are commands, the numbers are the arguments for the commands. Upper case letters indicate absolute coordinates, lower case letters indicate relative coordinates relative to the end point of the previous command.

"But -1.85.43 does not look like a number" you say? Right! Its not a number, its two numbers, -1.85 and 0.43.

"And why are there commas between some numbers, and no commas between other numbers?" Well, the comma is there if there is no implicit way to tell apart two numbers. It could also be a space. A negative number may simply be appended to another number, as for the reader it must be obvious that a "-" can't be within a number, so it must start a new number. The same rule also allows pasting a number n > 0 and n < 1 to be appended without its leading "0" to a non-integer number. It must be obvious for the reader that a non-integer number can not contain two ".", so the second one must start a second number.

This garbage can be expressed as BNF: https://www.w3.org/TR/SVG11/paths.html#PathDataBNF

To make sense of that number garbage, you basically have to take as much characters as possible that form a number, take that as an argument, and look for the next one. Whitespace and commas may be used to separate numbers.

More nonsense?

There is a "lineto" command (L/l) that expects pairs of numbers (coordinates) as arguments. There is a shortcut for horizontal lines ("H" or "h") that accepts only x coordinates, y is implicit from the end point of the previous command. And there is a shortcut for vertical lines ("V" or "v") that accepts only y coordinates. Both H/h and V/v accept MORE THAN ONE ARGUMENT in case you might want to draw two, three, four or ten succeeding horizontal (or vertical) lines.

The "moveto" command (M/m) expects pairs (PLURAL!) of numbers (coordinates). It moves to the first x-y-position, then switches to "lineto" mode, and all but the first two numbers passed to "moveto" acutally behave as if passed to lineto. "M 1 2 3 4 6 5" is exactly equivalent to "M 1 2 L 3 4 6 5".

Then there is cubic Bézier curves. The commands C/c start at the end of the previous command, and expect groups (PLURAL!) of six numbers, coordinates of two control points and an end point to draw the curve. Each group of six numbers draws one curve. There is also a shortcut command "S/s" that expects groups (PLURAL!) of only four numbers, each group specifying only the second control point and the end point. The first control point is either "the reflection of the second control point on the previous command relative to the current point" (the end point of the previous curve), if the previous command was a cubic Bézier curve command, or the end point of the previous command if the previous command was not a cubic Bézier curve.

SVG also has quadratic Bézier curve commands, Q/q for the full spec, and T/t as shortcut. The same madness.

Note that a command named "E" or "e" would cause problems, because numbers may be given in exponential notation: Is "M0-1.2E3,0" the start of a move command with arguments 0, -1.2E+3, 0 or is it a move command with arguments 0 and -1.2 and an E command with arguments 3 and 0? It must be the former, because numbers must be greedily extracted. So if you want an E command, it needs whitespace in front of the E. "M0-1.2 E3,0" is clearly moveto 0, -1.2 followed by E 3, 0.

The SVG spec fights for single bytes in paths, by omitting sepeators for numbers and by nonsense shortcut commands, all in a format that is horrible bloated and repetitive. And the result makes all XML tools completely unusable for paths. XML just sees a string attribute with no more structure.

WHAT WERE THEY SMOKING?

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re: SVG - what were they smoking?
by Corion (Patriarch) on Mar 10, 2026 at 12:51 UTC

    I see this as an example of Conway's Law - you have one team thinking about a document, and one team thinking about turtle-graphics. Of course, they will create something where changes in the document structure don't affect the turtle-graphics-people, and changes in the turtle graphic part don't affect the document people.

    My favourite abuse of XML is the MOSMIX format (pdf), which is a KML dialect with fixed-width additional weather data. The justifications to me are that the fixed-width data already existed from older formats and that wrapping each column in additional XML would have bloated the (uncompressed) file size a lot. But still, it's a funny abuse:

    <kml:Placemark> <!--Beginn einer Vorhersage für einen Punkt --> <kml:name>01025</kml:name> <!--Stations-ID --> <kml:description>TROMSOE</kml:description> <!--Stations Name --> <kml:ExtendedData> <!--Hier beginnt die Vorhersage der einzelnen Groes +sen --> <dwd:Forecast dwd:elementName="TTT"><dwd:value> 272.35 272.45 272.55 2 +72.85</dwd:value></dwd:Forecast> <!--Eine Vorhersagegroesse-->

    Other than the format, the data feed is nicely stable and works with Weather::MOSMIX for six years without changes already.

      I see this as an example of Conway's Law - you have one team thinking about a document, and one team thinking about turtle-graphics.

      Probably. Maybe the turtle graphic "modem command language" already existed when SVG was designed, and was simply reused? That's the only way the spec of the SVG path element makes any sense to me. If you "think in XML" and want to specify a vector graphics format from scratch, you would use XML nodes for path elements, not string fragments.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: SVG - what were they smoking?
by choroba (Cardinal) on Mar 10, 2026 at 11:59 UTC
    That's awful.

    This reminds me of the SSF or Shakti Standard Format. It looks like XML on the outside, but inside you find numbered lists in CDATA sections, nested parentheses, and when needed, values separated by colons and slashes and equal signs whenever someone felt they needed another level of a structure. If I remember correctly, there were 8 different types of markup, one of them even using < without quoting, making the files ill-formed XML.

    <Sentence id="8"> 1 (( NP <drel=k2:3> 1.1 biddalni NN )) 2 (( VGNF <drel=vmod:1/name=3> 2.1 kanetappudu VM )) 3 (( NP <drel=nmod:2> 3.1 eVMwo INTF 3.2 maMxi CL )) 4 (( NP <drel=k1:1/name=2> 4.1 wallulu NN )) 5 (( VGF <name=1> 5.1 canipowunnAru VM 5.2 . SYM )) </Sentence>

    Do you see the :X parts? They are in fact references to the name=X "anchors"!

    Why do I know? I needed to convert some of their data into a different format. Thankfully, I could use Perl and regexes. Tears of joy.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: SVG - what were they smoking?
by harangzsolt33 (Deacon) on Mar 12, 2026 at 07:09 UTC

    Wow. That is madness! At some point, I thought about writing an SVG decoder, but you discouraged me completely. Recently I looked into how JPEG-XL image headers store the width and height of an image, and let me tell you, it's total madness! ( https://www.ffmpeg.org/doxygen/6.0/jpegxl__probe_8c_source.html ) And then I tried to download the manual that describes the format to understand it better, and lo and behold, it costs 300 US dollars to download.

    I don't understand this. If their goal was to create a new file format that would supersede JPEG and WEBP and AVIF and be used widely by all people, then why not publish the stupid manual for free? I don't understand the logic here.