I don’t know why, every time I read a specification text, I have a headache.
Although I can understand this kind of text is more formal than a tutorial, it takes so much time to extract a useful information from them that I feel like struggling in the mud and in the obscurity, except I have clothes (at least most of the time when I work).

Last week I started to learn more about the XLIFF format, an XML format for the translation tools.
The specification document from OASIS, a group that produces many open formats, strangely reminds me the W3C documents. You know ? The documents you read three times entirely before finding the right few words you need.

What is wrong with this kind of document ?
Let’s read it together..

First screen page:

Date, who’s who, versions of the page.
OK, but why not reduce that to two lines (tile then date + version) and put the details in a chapter at the end of the document ?
What reader wants to use his mouse scroll button for that ?
From an ergonomic point of view, what is important first ? Probably a summary.

Second screen page:

  • An abstract (at last).
  • A very interesting paragraph, called Status, that explains in a few (ton of) words that the authors of the documents, already largely mentioned earlier, are effectively the authors of the document. What a surprise!
  • Some general links that many (I mean many as one percent of the) readers will consult.
  • A few notices that explain that the – already mentioned – authors detain a copyright on this document. Who would presume they own what they created ? An essential information again.
    In short, that is a license. Apparently, it would not be enough to give a link. We needed a big page right in the front of the document.

Once again, a chapter at the end of document would have been enough for all these informations.

Third screen page:

The table of contents.
Aaaaaaahhhhh ! Something useful (I’m not ironic for once).
Just a little advice: an enormous vertical spacing was not required. In fact, it makes reading more difficult. I feel I am playing a game where you must gather spare parts.
Apart that, a horizontal shifting (indentation) to materialize the chapter hierarchy would have made reading easier too.

Forth screen page:

First chapter.
After a short introduction, the jargon begins:

XLIFF is specified in two "flavors"

OK, then it is a "flavored" language. Does it taste good ?
In a sense that is poetic. Unfortunately the word is not very clear. I presume it would have been too common (or too much work) to search for a meaningful word. Something as "derived formats" or "variants".
I let you imagine other words, as a game when you get bored (reading me). I know it is pleasant to follow the fashion, I’m lazy too, sometimes, as any human being. But considering the severity of this document, I’m a bit surprised to read such a frivolous word here. It is like finding a childish drawing in the middle of a dictionary.

Chapter 2: General structure

Here there is something I do not understand, from an ergonomic point of view (ergonomics is my passion, I use this word in any circumstance):
The author is explaining the XLIFF document structure. As any XML-based format, it is a hierarchical structure. But instead of writing the hierarchy using the HTML <ul> tag, he writes long and boring sentences such as "The element A is the root, that contains the element B that contains the element C, etc".

Personally, in place of this original explanation:

XLIFF is an XML application, as such it begins with an XML declaration. After the XML declaration comes the XLIFF document itself, enclosed within the <xliff> element. An XLIFF document is composed of one or more sections, each enclosed within a <file> element. The <file> element consists of a <header> element, which contains metadata about the <file> , and a <body> element, which contains the extracted translatable data from the <file>. The translatable data within <trans-unit> elements is organized into <source> and <target> paired elements. These <trans-unit> elements can be grouped recursively in <group> elements.

I would have found more readable to have a structured example with comments, such as:

I admit this way we miss the pleasure to read a lot of words in nicely well-formed sentences.

I will not insist on very useful sentences such as "XLIFF is an XML application, as such it begins with an XML declaration". What developer would have guessed that ? (would you ?)

After a long, and somewhat boring paragraph, we finally get what we are waiting for a few pages: a practical example.
Unfortunately, the syntax is not colored and there is no indentation. That would have been a good idea (please add that to our DOs list).

Right after the example, the document tells "The complete tree structure is available in Appendix A."
If you follow the link, you will see something like that:

<xliff>1
| |
| +— [Extension Point]
| |
+— <file>+
 |
 +— <header>?
 | |
 | +— <skl>?
 | | |
 | | +— (<internal-file> | <external-file>)1
 | |
 | +— <phase-group>?
 | | |
 | | +— <phase>+
 | | |
 | | +— <note>*
 | |
 | +— <glossary>*
 | | |
 | | +— (<internal-file> | <external-file>)1

I just have a few problems with that formatting:

  1. It is not exactly the most aesthetic formatting I ever seen.
    In fact, it seems to come from a code converter of the ’70s, when the screens were black and white and the GUI was just a research project at the Xerox PARC.
  2. If you want to copy-and-paste this code to you favorite code editor, the one that displays great colors and knows how to indent code, you will be disappointed: most probably the prefixes as " | | +—" won’t be decoded (I may say decrypted), preventing your editor from understanding (and colorizing) the code.

This formatting makes the example at the same time difficult to read as it is, and impossible to improve using your own tools. Wow ! (as Microsoft would say)

Oh yes, there is a legend:

(legend: 1 = one
 + = one or more
 ? = zero or one
 * = zero, one or more)

That explains the strange formatting.
(Is it an excuse ?)
You can note how it makes things clearer..

I will not detail the rest of the document. Just let me tell you it has been easier for me to understand the XLIFF format by reading directly its XSD schema than by decrypting the specification document.
In some way, the XSD is written efficiently, and it’s clear. Of course, they had to follow the XSD format. Maybe we should invent a "Specification Document Format" to force writers to follow some common-sense rules.

I mentioned the W3C documents earlier. If you read their specification documents (as this one), you understand where OASIS found their model.

What are my advises ?

Apart from what has been told upper, I think the writers should keep some principles in mind when they write a specification document:

  • The examples should be commented inline, just as any developer does.
  • The examples should be as short as possible, minimalist in fact.
    It is not always necessary to give a complete example. Keep the big examples for an appendix or an external page.
  • The document should begin by a very short (2 or 3 sentences) summary.
    Then the table of contents should follow.
  • You should never forget the reader does not know what he is reading.
    You have to explain what you are talking about.
  • Always give the context and the reasons.
    Just tell what is this element about, and what is it for.
  • Never write a general sense-less description.
    Ex: "The element A is found in its upper level and can contain an element B" : OK, but what is the purpose of the element A ? Can we put it elsewhere ? What elements can it replace ? What family of elements does it belong to ?
    In short: any description should start by telling what is this thing, what is it for and what do we do with it most of the time.
  • A long paragraph is boring, we easily lose the track.
    Never forget most readers are either:
    • looking for a specific information, a detail they need, or
    • trying to understand the main concept.
  • For the same reason you should fragment your explanation in different levels of details:
    • The main description that tells what it is.
      • More details about where we use it, in what context.
        • An example
        • The usage, explained in details.
          • A specific (rare) usage or situation.

These suggestions are very general. And you may think they are obvious.
I agree with you, but when I read the specification documents, and in fact most documents on the Internet, I feel the most obvious rules are not the most applied ones.

It is not easy to write a good documentation. In takes time. It needs a frequent rewriting, and even a less frequent complete rewriting.

When you are stuck on a rephrasing, ask yourself: "How would I explain that to a friend in a few words ?".
That’s always preferable to asking you "How can I write an indigestible jargonistic* and robotic description ?"

Now, I let you digest these (so-essential or even vital) thoughts.

Have a good day (or night).

[ By the way, if you think my English writing is just horrible, you are free to correct my text and send me your fixed HTML document ]

Jargonistic: I just invented that word. For those would love using jargon every time everywhere [I wonder if it’s a fashion, or if it’s pathological].