Documentation as Code: generate commented YAML automatically

Artem Chernyshev
November 11, 2020

Keeping documentation in sync with the code is always a hard task. That’s why there are automatic tools that can generate documentation for Go code. Godoc, for example, does a great job building a well structured code reference by parsing Go definitions along with comments.

Tools like this are common for other languages as well. Consider Doxygen, Javadoc, and Pydoc for example.

In Go, however, there is nothing that can generate YAML configs with the comments included.

Talos OS, like Kubernetes, can end up with a pretty complicated configuration file. There are lots of nested structures, some of which are optional, so it can be pretty hard to get a grasp of the different YAML variants you can construct. You can look into the Go sources to read the YAML tags on various fields, but re-assembling that tree is tedious and error prone. Another option is to start with the manually defined YAML examples in the Talos documentation. But such docs get outdated fast and are subject to human error. Thus, they can contain typos or malformatted YAML.

We also have a CLI command, talosctl gen config, that templates out configurations, but it omits fields that are nil by default and does not provide comments on any field. So, given the complexity of our configuration file, as well as the general difficulty in finding the proper options, we decided to implement a documentation generator for YAML.

The goals we wanted to achieve in this generator are:

Have a single source of truth both for markdown and for YAML comments.
Generate YAML configuration as if it was manually created by hand: with comments, examples and such.
Define all examples using actual data structures to ensure that the results are always in sync with the code.

We use the gopkg.in/yaml.v3 library to encode configuration data into YAML. Luckily, starting from v3 you can actually add comments to the output YAML. But there is one caveat: the only way you can add comments to YAML is to use yaml.Node. This means it is necessary to convert all input structures into YAML nodes.

So let’s imagine we have a structure defined:

type Config struct {
	// description: |
	//   Indicates the schema used to decode the contents.
	// values:
	//   - v1alpha1
	ConfigVersion string `yaml:"version"`
	// description: |
	//   Enable verbose logging.
	// values:
	//   - true
	//   - yes
	//   - false
	//   - no
	ConfigDebug bool `yaml:"debug"`
	// description: |
	//   Indicates whether to pull the machine config upon every boot.
	// values:
	//   - true
	//   - yes
	//   - false
	//   - no
	ConfigPersist bool `yaml:"persist"`
	// description: |
	//   Provides machine specific configuration options.
	MachineConfig *MachineConfig `yaml:"machine"`
	// description: |
	//   Provides cluster specific configuration options.
	ClusterConfig *ClusterConfig `yaml:"cluster"`
}

Given the layout above, we want to generate a YAML file like the below:

version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging.
persist: true # Indicates whether to pull the machine config upon every boot.
# Provides machine specific configuration options.
machine:
	# ...
# Provides cluster specific configuration options.
cluster:
	# ...

Converting Input into yaml.Node Tree

So, before everything else it is necessary to implement a function that can convert any type into a yaml.Node tree.

yaml.Node has an Encode method, but for our case it only works for primitive types, because if you go through the struct or map recursively, any nested yaml.Node will lose its comments.

Which means that struct, map and slice recursion has to be implemented in the wrapper module.

Recursive iteration itself is pretty straightforward, as long as you consider that YAML has additional tags like omitempty, flow, and inline that need to be handled specially in that iterator.

When building the YAML tree, we had to consider the following::

maps and structs should be converted into yaml.MapType nodes and Contents slice should be populated by key followed by value representation.
For slice each element is converted to yaml node and added to node Contents.
Types that implement yaml.Marshaller should be marshalled before being converted to yaml.Node.
Any other fields can go through the regular yaml.Node Encode call.

Once an entry is converted into yaml.Node it becomes possible to add 3 different comment types:

HeadComment which is added before the line.
LineComment added to the same line.
FootComment added after the line + adds one more blank line after.

For map nodes it’s necessary to properly define comments for a field. Line comments added to the key node can mess up YAML, so LineComment must always be added to the value node.

The full implementation for yaml.Node converter can be viewed here.

So the next question is how to get comment data for any particular node.

There are two options:

Define comments in the tags.
Documenting with generated Go code.

Define Comments in the Tags

This can be a pretty elegant solution if you don’t need multi-line comments:

type Config struct {
  Enabled bool `yaml:"enabled" head_comment:"Enable or disable." line_comment:"disabled by default"`
}

Then while iterating fields in a struct it should be possible to get all tags for the field:

for i := 0; i < v.NumField(); i++ {
...
  // get field name
  tag := t.Field(i).Tag.Get("yaml")
  parts := strings.Split(tag, ",")

  fieldName := parts[0]
  // fallback to lowercase field name if yaml tag is not defined
  if fieldName == "" {
    fieldName = strings.ToLower(t.Field(i).Name)
  }
  ...
  headComment := t.Field(i).Tag.Get("head_comment")
  lineComment := t.Field(i).Tag.Get("line_comment")
  footComment := t.Field(i).Tag.Get("foot_comment")
  value = v.Field(i).Interface()
  // key yaml node
  keyNode, err := toYamlNode(key)
  ...
  // will be an empty string if the tag is not defined
  keyNode.HeadComment = t.Field(i).Tag.Get("head_comment")
  keyNode.FootComment = t.Field(i).Tag.Get("line_comment")
  // convert value node
  valueNode, err := toYamlNode(value)
  ...
  valueNode.LineComment = t.Field(i).Tag.Get("line_comment")
  ...
}

import (
	"fmt"
	"log"
	"gopkg.in/yaml.v3"
)

func main() {
	c := &Config{
		Enabled: true,
	}
	result, err := yaml.Marshal(toYamlNode(c))
	
	if err != nil {
		log.Fatalf("failed to marshal config %s", err)
	}
	fmt.Printf("result:\n%s", result)
}

This results in the following YAML::

# Enable or disable.
enabled: true # disabled by default

Unfortunately the “tags” approach has significant drawbacks:

Poor readability of comments, especially if they should have multiple lines.
No way to define strict and complex examples (unmarshalling json is an option, but it does not enforce validity).
It’s not the most obvious place for comments.

Documenting with Generated Go Code

Considering the drawbacks of the “tag” approach, we’ve selected a different approach.

First we decided what kind of structure should define full documentation for a struct or any other type.

We ended up with the following:

type Doc struct {
  // Comments stores foot, line and head comments.
  Comments [3]string
  // Fields contain field documentation if the related item is a struct.
  Fields []Doc
  // Examples list of example values for the item.
  Examples []*Example
  // Values are only used to render valid values list in the documentation.
  Values []string
  // Description represents the full description for the item.
  Description string
  // Name represents struct name or field name.
  Name string
  // Type represents struct name or field type.
  Type string
  // Note is rendered as a note for the example in the markdown file.
  Note string
}

So we have comments, fields and examples here. These are the only 3 things used during YAML generation. The others are there to generate markdown documentation for the file.

Each field is documented using the same struct.

Note that Fields is actually a slice, because we don’t really need to rely on field names to get fields from a struct. reflect iterates through them using indices.

Next, we need to tell the YAML encoder that input type has documentation defined.

For that case we defined an interface:

type Documented interface {
	GetDoc() *Doc
}

And then any Type that needs to return this documentation should implement this interface, like this:

func (_ Config) GetDoc() *Doc {
  configDoc := &Doc{}
  configDoc.Type = "Config"
  configDoc.Fields = make([]encoder.Doc, 5)
  configDoc.Fields[0].Name = "version"
  configDoc.Fields[0].Type = "string"
  configDoc.Fields[0].Description = "Indicates the schema used to decode the contents."
  configDoc.Fields[0].Comments[encoder.LineComment] = "Indicates the schema used to decode the contents."
  configDoc.Fields[0].Values = []string{
    "`v1alpha1`",
  }

  configDoc.Fields[1].Name = "debug"
  configDoc.Fields[1].Type = "bool"
  configDoc.Fields[1].Description = "Enable verbose logging."
  configDoc.Fields[1].Comments[encoder.LineComment] = "Enable verbose logging."
  configDoc.Fields[1].Values = []string{
    "true",
    "yes",
    "false",
    "no",
  }
  ...
  return configDoc
}

This doesn’t look too easy to define, right? And that’s only 2 fields without any examples.

So let robots do that for you! Go provides a handy tool called go generate.

We generate documentation metafiles using go generate and our custom docgen script. Docgen relies on ast parser to analyze go files and then creates Go metafile using go text/template module. For more details about docgen you can refer to this file.

Firstly you define the following go:generate command in the file you want to generate documentation for:

//go:generate docgen ./v1alpha1_types.go ./v1alpha1_types_doc.go Configuration

Where the first argument is command name, second is input file, third one is output file and the last one is an unique id for the generated documentation set.

Then if you run:

$ go generate pkg/machinery/config/types/v1alpha1/v1alpha1_types.go

it will create a new file:

pkg/machinery/config/types/v1alpha1/v1alpha1_types_doc.go

That will contain all documentation definitions as global go structures and will define GetDoc() method for each documented structure it has.

Generate Examples

Generating YAML config where each field is complemented with a comment already allows us to use the generated config as documentation. But this config can become even more useful if we define examples for the fields that are not populated and put them as valid commented out YAML blocks in appropriate positions in the config.

So let’s consider the following structure:

// KubeletConfig represents the kubelet config values.
type KubeletConfig struct {
	//   description: |
	//     The `image` field is an optional reference to an alternative kubelet image.
	//   examples:
	//     - value: '"docker.io/<org>/kubelet:latest"'
	KubeletImage string `yaml:"image,omitempty"`
	//   description: |
	//     The `extraArgs` field is used to provide additional flags to the kubelet.
	//   examples:
	//     - name: Description for this example
	//       value: >
	//         map[string]string{
	//           "key": "value",
	//         }
	KubeletExtraArgs map[string]string `yaml:"extraArgs,omitempty"`
	//   description: |
	//     The `extraMounts` field is used to add additional mounts to the kubelet container.
	//   examples:
	//     - value: kubeletExtraMountsExample
	KubeletExtraMounts []specs.Mount `yaml:"extraMounts,omitempty"`
}

// and the definition
...
  k := &KubeletConfig{
	  KubeletImage: "docker.io/autonomy/kubelet:v1.19.3",
  }
...

If we use the standard YAML marshaller, it will output the following result:

kubelet:
    image: docker.io/autonomy/kubelet:v1.19.3

But with the changes we made it is actually possible to give the following output:

kubelet:
    image: docker.io/autonomy/kubelet:v1.19.3 # The `image` field is an optional reference to an alternative kubelet image.

    # # Description for this example
    # # The `extraArgs` field is used to provide additional flags to the kubelet.
    # extraArgs:
    #     key: value

    # # The `extraMounts` field is used to add additional mounts to the kubelet container.
    # extraMounts:
    #     - destination: /var/lib/example
    #       type: bind
    #       source: /var/lib/example
    #       options:
    #         - rshared
    #         - ro

Uncommenting the example actually gives you valid YAML.

And the number of examples is not limited to a single one: you can have several different examples for various scenarios.

How does that work?

You may have noticed that there is yaml inside comments for each field of a struct:

//   examples:
	//     - value: kubeletExtraMountsExample
or:
	//   examples:
	//     - name: Description for this example
	//       value: >
	//         map[string]string{
	//           "key": "value",
	//         }

This value should be valid Go code. Which is then used by docgen to populate examples, like this:

KubeletConfigDoc.Fields[1].AddExample("Description for this example",  map[string]string{
	"key": "value",
})

While the second example is using inline Go code, kubeletExtraMountsExample shows another approach: you can define an example as a global variable in the package and then only refer to it in a comment.

Having examples defined as Go code is great, because if an example gets outdated, your code won’t even compile.

Markdown

Talos CLI has talosctl docs command, which can generate markdown files for CLI and now also for the configs. As we now have all documentation as code, it is possible to build our API reference right in the runtime.

Doc objects are defined separately for each type, so it may be hard to get docs for the whole package.

It is not really a good idea to refer to each Doc object directly, so docgen also generates a single function that returns the collection of Doc objects for a package. That’s what docgen’s 3rd parameter is used for.

And then the CLI code just uses this function to get all the documentation entries for the package.

Conclusion

Even though we ended up implementing a significant chunk of yaml marshaller internals, it gave us a lot of benefits.

The generated config really looks like it was created by hand. Documentation is defined in a single place, and is kept as close to the code as possible, so there are less chances to forget to update the documentation.

The module definitely has a lot of room for improvement: it can be made more generic and maybe some parts of it can even make it to the upstream repo. And we can finally improve the examples, making them closer to real life and providing more options.

Generated markdown templates can be viewed at https://www.talos.dev/docs/v0.7/reference/configuration/.

This change is live in the next v0.7 Talos release, and is, like the rest of Talos OS, FOSS. If you want to play around with talosctl gen config, you can download the latest master and build Talos yourself.

Build instructions can be viewed by typing make help in the root of the repo.

If you have more questions, hop on our Slack and join the community!

https://taloscommunity.slack.com/