Reading Go's AST

2016-12-30

Writing "Hello, World!" in Go is beautifully simple:

package main

import "fmt"

func main() {
	fmt.Println("Hello, World!")
}

Thanks to its excellent standard library, Go makes it easy to visually inspect a program's abstract syntax tree. In fact, it's worth taking a moment to familiarize oneself with all the interesting things here within the go package, the source of many useful AST-related tools.

Let's use several of the sub-packages within the go package to print out a representation of the hello world program above. When working with ASTs in Go, there are two helpful functions worth knowing about: parser.ParseDir and parser.ParseFile. As the names suggest, one parses a directory of Go code. The other parses a single file. For now, we will focus on working with a single file and so will use parser.ParseFile.

Within a directory somewhere on our $GOPATH, we will start by putting the hello world program above into a file called hello.go. Next, we will begin writing our own main.go file. Note that our main.go will also define a main function, which means that Go will complain if we try to compile a binary. For the sake of simplicity, let's not worry about this problem for the moment and run our own main.go file by using the following command:

$ go run main.go

We will start from the signature of ParseFile and work from there. First, we'll need a pointer to a token.FileSet, Go's representation of a collection of files. Again, since we're concerned with only a single file, we can simply use the token.NewFileSet function to obtain a token.FileSet pointer. From there, we'll pass in the name of our hello world program, example.go.

Our program now looks like this:

package main

import (
	"go/ast"
	"go/parser"
	"go/token"
	"log"
)

func main() {
	fset := token.NewFileSet()
	f, err := parser.ParseFile(fset, "example.go", nil, 0)
	if err != nil {
		log.Fatal(err)
	}

	ast.Print(fset, f)
}

The final line of our program reveals yet another helpful function when learning about Go's AST, namely ast.Print. The function will output a representation of the AST. Let's run our program and observe the result.

Assuming Go could find example.go and parser.ParseFile did not return an error, we should see just over 80 lines of output. At the top level, we will see an *ast.File:

 0  *ast.File {
 1  .  Package: example.go:1:1
 2  .  Name: *ast.Ident {
 3  .  .  NamePos: example.go:1:9
 4  .  .  Name: "main"
 5  .  }
 6  .  Decls: []ast.Decl (len = 2) { /* omitted */ }
72  .  Scope: *ast.Scope { /* omitted */ }
77  .  Imports: []*ast.ImportSpec (len = 1) { /* omitted * / }
80  .  Unresolved: []*ast.Ident (len = 1) { /* omitted */ }
83  }

In spite of the amount of text generated for our simple hello world program, the AST stores our program in a predictable manner. Within the ast.File properties, we have a Name identifier on line 2 which indicates the package's name. Skipping over Decls for a moment, we see a Scope property, an Imports property, and an Unresolved property.

What is interesting about these last three properties is they contain pointers to elsewhere in the AST.

12  .  .  .  .  0: *ast.ImportSpec {
13  .  .  .  .  .  Path: *ast.BasicLit {
14  .  .  .  .  .  .  ValuePos: example.go:3:8
15  .  .  .  .  .  .  Kind: STRING
16  .  .  .  .  .  .  Value: "\"fmt\""
17  .  .  .  .  .  }
18  .  .  .  .  .  EndPos: -
19  .  .  .  .  }
/* omitted */
77  .  Imports: []*ast.ImportSpec (len = 1) {
78  .  .  0: *(obj @ 12)
79  .  }

For example, within the Imports property on line 78, there is a pointer to an object on line 12 in the form *(obj @ 12). Returning to line 12 above, we will find an *ast.ImportSpec which describes the fmt import statement. The type also lines up with the slice declared on line 77, []*ast.ImportSpec.

The majority of the lines in the AST representation are devoted to describing the Decls property on line 6. Let's return to it now and take a look at its internals.

 6  .  Decls: []ast.Decl (len = 2) {
 7  .  .  0: *ast.GenDecl {
 8  .  .  .  TokPos: example.go:3:1
 9  .  .  .  Tok: import
10  .  .  .  Lparen: -
11  .  .  .  Specs: []ast.Spec (len = 1) {
12  .  .  .  .  0: *ast.ImportSpec {
13  .  .  .  .  .  Path: *ast.BasicLit {
14  .  .  .  .  .  .  ValuePos: example.go:3:8
15  .  .  .  .  .  .  Kind: STRING
16  .  .  .  .  .  .  Value: "\"fmt\""
17  .  .  .  .  .  }
18  .  .  .  .  .  EndPos: -
19  .  .  .  .  }
20  .  .  .  }
21  .  .  .  Rparen: -
22  .  .  }
23  .  .  1: *ast.FuncDecl { /* omitted */ }

The first thing worth noting is that Decls, short for "declarations", is a slice of declarations. Within that slice, we see two members, a "generic declaration" ast.GenDecl (line 7) and a "function declaration" ast.FuncDecl (line 23). Sorting out the significance of each implementer of the ast.Decl interface is an exercise of reading through the docs -- there are only a few and each implementation has the Decl suffix.

The ast.GenDecl covers the declaration of the "fmt" import statement. Note how the string "fmt" is represented, a ast.BasicLit or basic literal whose Value property itself includes quotes (line 16).

Compared to ast.GenDecl, the use of ast.FuncDecl (line 23) is much longer. At its top level, the function declaration has a number of properties:

23  .  .  1: *ast.FuncDecl {
24  .  .  .  Name: *ast.Ident { /* omitted */ }
33  .  .  .  Type: *ast.FuncType { /* omitted */ }
40  .  .  .  Body: *ast.BlockStmt { /* omitted */ }
70  .  .  }

The *ast.FuncDecl has a Name property where the function's name is stored. The Type field, an *ast.FuncType, stores information about Params and Results. Because our main function has neither, let's skip to the Body property.

40  .  .  .  Body: *ast.BlockStmt {
41  .  .  .  .  Lbrace: example.go:5:13
42  .  .  .  .  List: []ast.Stmt (len = 1) {
43  .  .  .  .  .  0: *ast.ExprStmt { /* omitted */ }
68  .  .  .  .  Rbrace: example.go:7:1
69  .  .  .  }

The Body property is an ast.BlockStmt pointer and holds information about the body of our main function. In particular, it is the List property of the block statement that is most interesting. We see that List is a slice of ast.Stmt. Like ast.Decl, ast.Stmt is an interface and there are many types in the ast package which implement this interface. For example, there is an ast.ForStmt, an ast.AssignStmt, and an ast.ReturnStmt to name but a few.

Before we look at the rest of the AST, recall that the body of our hello world program looks like this:

fmt.Println("Hello, World!")

We invoke a method on package name with a string argument. Let's see how that translates to the AST.

43  .  .  .  .  .  0: *ast.ExprStmt {
44  .  .  .  .  .  .  X: *ast.CallExpr {
45  .  .  .  .  .  .  .  Fun: *ast.SelectorExpr {
46  .  .  .  .  .  .  .  .  X: *ast.Ident {
47  .  .  .  .  .  .  .  .  .  NamePos: example.go:6:2
48  .  .  .  .  .  .  .  .  .  Name: "fmt"
49  .  .  .  .  .  .  .  .  }
50  .  .  .  .  .  .  .  .  Sel: *ast.Ident {
51  .  .  .  .  .  .  .  .  .  NamePos: example.go:6:6
52  .  .  .  .  .  .  .  .  .  Name: "Println"
53  .  .  .  .  .  .  .  .  }
54  .  .  .  .  .  .  .  }
55  .  .  .  .  .  .  .  Lparen: example.go:6:13
56  .  .  .  .  .  .  .  Args: []ast.Expr (len = 1) {
57  .  .  .  .  .  .  .  .  0: *ast.BasicLit {
58  .  .  .  .  .  .  .  .  .  ValuePos: example.go:6:14
59  .  .  .  .  .  .  .  .  .  Kind: STRING
60  .  .  .  .  .  .  .  .  .  Value: "\"Hello, World!\""
61  .  .  .  .  .  .  .  .  }
62  .  .  .  .  .  .  .  }
63  .  .  .  .  .  .  .  Ellipsis: -
64  .  .  .  .  .  .  .  Rparen: example.go:6:29
65  .  .  .  .  .  .  }
66  .  .  .  .  .  }

Since we have only one statement in our main function, we likewise have only one ast.ExprStmt pointer in our slice of statements. The ast.ExprStmt has one property: X, or "target" presumably, which is of type Expr, another interface which all expressions implement. Examples of implementers of the Expr interface are ast.CallExpr, ast.StarExpr (i.e., *, indicating a pointer), and ast.UnaryExpr.

The target of our ast.ExprStmt is an ast.CallExpr, and indeed we are calling a function. Aside from the Lparen and Rparen properties which indicate where in the parsed files the left and right parens appear, an ast.CallExpr has a Fun and Args property. The Args property is a representation of our string "Hello, World!" and consists of a slice of ast.Expr, a type we have seen already. The single argument is an ast.BasicLit or "basic literal" type, which has both a Kind property, which indicates the kind of the token in question and a Value property. Note a list of all tokens in Go can be found here. As we noted above raw strings are stored with double quotes.

Finally, we get to the Fun property, an ast.SelectorExpr type, which represents a method call against a certain target.

45  .  .  .  .  .  .  .  Fun: *ast.SelectorExpr {
46  .  .  .  .  .  .  .  .  X: *ast.Ident {
47  .  .  .  .  .  .  .  .  .  NamePos: example.go:6:2
48  .  .  .  .  .  .  .  .  .  Name: "fmt"
49  .  .  .  .  .  .  .  .  }
50  .  .  .  .  .  .  .  .  Sel: *ast.Ident {
51  .  .  .  .  .  .  .  .  .  NamePos: example.go:6:6
52  .  .  .  .  .  .  .  .  .  Name: "Println"
53  .  .  .  .  .  .  .  .  }
54  .  .  .  .  .  .  .  }

The ast.SelectorExpr names the "fmt" identifier as its X property. The Sel property names the method name, "Println".

And with that, we have a full AST to represent our hello world program.

In and of itself, reading Go's AST reveals all sorts of interesting design decisions. All the more, though, with an understanding of how Go represents its programs in AST, one can start to write programs to manipulate the AST and write the result back out to disk. Along the way, I have often wondered how Go represents a particular construct in its AST. Answering that question is simply a matter of doing what we have done here, parsing the program's AST using the Go standard library, and then printing the result.