Extracts documentation from JavaScript source files.

Documentation Syntax

Documentation is extracted using JavaDoc like annotations inside /** … */ comments.

The annotations are defined in {@link com.google.caja.ancillary.jsdoc.AnnotationHandlers}. Below is the default set, but additional handlers can be added.
AnnotationSignatureDescription
@author email | name '<' email '>' | email '(' name ')' The name of the author of the API element or file.
{@code …} source Embedded and syntax highlighted source code
TODO: finish

Usage

To generate JSON, run

  java -jar jsdoc.jar file1.js file2.js file3.js foo/package.html
and JSDoc will write a JSON file describing all the API elements and files extracted.

This file can be plugged into the AJAX doc viewer, or you can generate a static HTML tree by doing

  java -jar jsdoc.jar -o out_dir file1.js foo/package.html
which will output both a jsdoc.json, and an html file tree under out_dir file.
out_dir
  index.html
  jsdoc.json

Types in @param, @return, etc.

In keeping with existing static-analysis based jsdoc tools, JSDoc supports most of the java annotations plus a few that attempt to bolt an EcmaScript 4 style type system onto EcmaScript 3,

JavaDoc has @param and @return annotations and JSDoc supports the java syntax but adds an optional syntax where the first element is a type in curly brackets.

/**
 * @param {string} html the HTML to translate
 * @return {string} the plain text equivalent of the input HTML
 */
function htmlToText(html) { … }

The new @type annotation describes the type of the declaration.

/**
 * The ratio of a circle's circumference to its diameter.
 * @type {number}
 */
var PI = 3.141592654;

JSDoc does not do type checking, and so is agnostic to the type syntax and as to whether, for function declarations the @type specifies the return type of the function, or the type of the function as a whole. Since existing type checking efforts have taking conflicting sides on this, it's best to use @param and @return and @this for describing functions.

JSDoc will warn you if identifiers that appear in types are not valid symbols in the current scope. For example, in

/**
 * @type {Array.<foo.Bar>}
 */
JSDoc will look for the symbols Array and foo.Bar in the scope in which the comment appears. If those values are undefined, or evaluating them raises an exception, then JSDoc issues a warning.

Why Dynamic Analysis?

Many languages have a clear way of separating a module's public API from its private implementation details. In JavaScript, that's done by using lexical scopes to bundle together elements that are connected, as in:

var myModule = (function () {  // function creates a hidden scope
  // Define private variables
  var serialNumber = 0;

  // Define public stuff
  function getSerialNumber() { return serialNumber++; }

  // Export the public API
  return {
    getSerialNumber: getSerialNumber
  };
})();

There are many variations on this theme, but they all do a good job of making life hard for programs that try to figure out a JavaScript program's API just by looking at its source code, and so they either miss large parts of the API, or force their users to write JavaScript in a style that doesn't take advantage of JavaScript's strengths such as closures and scoping for information hiding.

Specifically, static analyzers tend to have trouble with

  1. JavaScript style information hiding
  2. Monkey patching
  3. Class definition that involves mixins

JSDoc is more accurate because it skips static analysis and instead just runs the program and looks at what it produces. By looking at the program after it's run, JSDoc sees exactly what the programmer who wants to use the module would.

Tests as Documentation

JSDoc adds a new annotation @updoc to allow tests to be embedded in code.

Frequently, java and python code uses private members as a way to hide implementation details. Testing frameworks such as PyUnit and JUnitX each work around the inaccessibility of private members so that those API elements can be tested.

But in JavaScript, there is no private API, so careful developers hide implementation details by using lexical scopes which are impenetrable to unit testing (modulo {@code eval} extensions in Firefox <= 3.0).

JSDoc finds @updoc tests in source code, runs them, and includes test results in the documentation.

See {@link com.google.caja.ancillary.jsdoc.Updoc} for examples and usage.

Design

Goals

Extract JavaDoc style documentation from JavaScript source code without restricting the way that JavaScript developers structure their code.

The documentation for a module should reflect the API elements added or modified by that module. It is a non-goal to document the private implementation details of a module.

Produce a view of the API that can be used with other tools such as IDE auto-completion & name suggestion by a module.

It is non a goal to document any private or protected API or hidden implementation details.

Overview

  --------+
 /        |        +-----------+        -----+      +-----------+        -----+
|   --------+      |           |       /     |      |           |       /     |
|  /        |  =>  | Extractor |  =>  | JSON |  =>  | Formatter |  =>  | HTML |
+-| JS File |      |           |      |      |      |           |      |      |
  |         |      +-----------+      +------+      +-----------+      +------+
  +---------+

JSDoc takes in a group of JavaScript files. An extractor rewrites those files to attach comments to declarations and definitions, and then uses Rhino to execute the rewritten JavaScript.

The executed JavaScript is run in the context of jsdoc.js, and the extractDocs function builds up a JSON structure representing the elements added to the API. It does this by comparing the API present before the module was executed with that after, so the extraction algorithm looks like:

  1. original_api := snapshot(global_namespace)
  2. execute_rewritten_module()
  3. modified_api := snapshot(global_namespace)
  4. module_api := modified_api - original_api
  5. return module_api

Snapshotting the API involves recursively walking everything reachable from the global object. As we walk each object, we attach a name to the object, so if we reach an object via the bar property of an object that we reached via the foo property of the global object, then we know the object can be referenced as foo.bar and so we assign it the name foo.bar. We walk the graph breadth-first so as to assign the shortest possible name to each API element. If we reach an object by more than one path, we mark the second and subsequent names as "aliases" and don't recurse to their properties. The snapshotting process looks like this

  1. Walk the object graph assigning names to nodes. Be sure not to miss intrinsics like Array that are in the global object, but that are missed by for (k in global).
  2. Resolve "promises". In
    /** @see foo */
    the documentation depends on the name by which foo is reachable from the global scope, not the local scope. Since names are derived after execution, the code that attaches delays linking documentation by putting the result in an anonymous function:
    jsdoc___.document(..., { '@see': function () { return nameOf(foo); } })
  3. Build a documentation tree like
    {  // Documentation for the global scope
      "foo": {  // A member of the global scope
        "@see": "myPackage.bar"  // Corresponds to a doc annotation
      }
    }
  4. Remove names and other book-keeping properties added by the walk.

Diffing the APIs involves comparing two JSON trees. Primitive values are considered different if they differ by !==. Objects and Arrays are considered different if they have a different set of property names or if their properties' values are recursively different. Identical internal nodes (Arrays and objects) are removed. This means that there will be no entry for Array unless some change was made to its API but if a module defines Array.addAll then there will be an Array entry in the resulting JSON.
Similarities to existing tools
JavaDocPydocJSDocReason
Doc Strings In /** … */ comments In the first string of the body As JavaDoc JS's comments are the same as java's (modulo unicode escapes)
Structure Comments contain @foo None As JavaDoc with different annotations Most JavaDoc annotations plus a few JS specific ones.
Extraction Style Static analysis Code evaluation As Pydoc but with a rewriting stage to turn comments into doc-strings. JS has first-class constructors and methods, so static-analysis won't work.

Annotation Extraction

Since JavaScript is a dynamic language, it's hard to tell statically from a declaration site which annotations are appropriate in any comment. E.g.

/**
 * @param {number} x the x-coordinate
 */
var setX;
looks like it defines a function, but we can only check that it is and it has a formal parameter named "x", and introduce blank doc for any missing parameters during execution.

This example raises another issue. How do we attach documentation to something that doesn't exist yet? We have options

  1. Identify all sites that assign to x and attach the documentation to all values assigned to x.
  2. Document the value in x when x leaves scope

The first would lead us to attach documentation to any number of objects and doesn't deal well with a variable being used as a temporary until the code converges on the real value. The second is more in keeping with the principle that each documentation comment corresponds to one API element, but deals poorly with a variable that is multiply declared by someone attempting block scoping in JavaScript. We choose 2 since we can fix the block scoping defect by computing apparent scopes instead of block level scopes if necessary.

We implement this "value on leaving scope" rewriting quite simply
/** @param {number} x */
var setX;
...
setX = function (x) { return -x; };
try {
  var setX;
  ...
  setX = function (x) { return -x; };
} finally {
  jsdoc___.document(  // Defined in jsdoc.js.  Attaches documentation to a value
      setX,           // The API element being documented.
      {               // A record containing the annotations from the comment.
        param: [
            function (apiElementName, apiElement) {  // The promise envelope.
              return (checkParam('x', apiElement),   // A runtime sanity check.
                      {                              // Decomposed documentation
                        name: 'x',
                        type: 'number'
                      });
            }
        ]
      });
}
This rewriting illustrates two concepts: promises and checks. The promise is the function envelope which allows delay of execution of documentation until the entire name graph has been computed. We do this because, by waiting until we know the name of an API element, we can issue a better error message should setX not actually have a formal parameter x. And a check is any logic that might issue an error if an annotation is not applicable in context or otherwise malformed.

There is one case where dynamic analysis doesn't give us all the information we need. For the simple class definition

function Point(x, y) {
  this.x = x;
  this.y = y;
}
Point.prototype.getTheta = function () {
  return Math.atan2(this.y, this.x);
};
the Point class has two members, x and y but walking the API doesn't find this information since it is quite likely that loading a module does not cause an instance to be created for every class defined. There are a few approaches we can take to try and fix this:
  1. statically determine the set of members of {@code this} referenced in the constructor
  2. do the same at runtime by extracting the list of members of this by inspecting myClass.prototype[methodName].
We attach to function documentation, the list of members of this that are directly referenced. The current documentation formatter only uses the members of functions which it determines are constructors, but it can be changed to look at all the methods attached to a class's prototype.

JSON format

TODO

Rewriting Rules

TODO