LDM-493: Data Management Documentation Architecture

  • Jonathan Sick

Latest Revision: 2016-10-26

Note

This draft is not under LSST change control.

1   Purpose and Scope

The Data Management Documentation Architecture defines LSST Data Management’s technical process for authoring, maintaining, and publishing documentation. This includes design documentation, technical implementation documentation, and operational documents such as user guides. By specifying management patterns and technologies, the goal of the Data Management Documentation Architecture is to ensure that documentation is available to the DM team and end-users when it is needed, in formats that are useful and appropriate.

The Data Management Documentation Architecture covers:

  • A taxonomy of documentation classes describing how each type of document fulfills specific roles.
  • Platforms for publishing all DM documentation (LSST the Docs), making documentation discoverable (LSST DocHub), and ensuring documentation is citeable.
  • A set of documentation formats that maintain a consistent reading and discovery experience, while also promoting developer efficiency.
  • Policies for organizing the production and maintenance of each class of documentation.

This Architecture does not cover ad-hoc written communication channels. The following are examples of subjects beyond the scope of this document:

2   Documentation Taxonomy

Data Management uses and produces multiple classes of documentation. Each class fulfills a specific role, with a consistent management and technical process.

2.1   Document classes

The classes of DM documentation are:

Requirements and interface control documentation (LPM, LSE)
These documents specify functionality that the Data Management System must deliver to the Project. Similarly, interface control documents are agreements between subsystems on functionality that cross subsystem boundaries. The production of this documentation is described in LPM-19: Change Control Process and its management is described in LPM-51: Document Management Plan.
Change-controlled DM design documentation (LDM)
These documents define the scope and budget of Data Management work that meet the requirements and interface control documentation. DM work is expected to be consistent with these design documents. LDM document authors are responsible for updating design documentation whenever necessitated by revised project requirements, or by evolving implementation choices. Design documents must be approved by the Change Control Board (LPM-19: Change Control Process) before becoming the new baseline. These documents are also governed by LPM-51: Document Management Plan. See 7   Change-Controlled Design Documents for how change-controlled documents are handled by the Data Management Documentation Architecture.
Technical notes (DMTN, SQR)
Technical notes are a standardized, durable, document series allowing useful information to be published on demand by any Data Management team member wishing to capture and share knowledge. Common uses of technical notes are descriptions of experiments with code or data, reviews of literature or technologies, usage notes, or technical documents that are too limited to trigger the change-control process. The primary audiences of technical notes are DM team members who build, maintain, or interface with a system and the science community. See 8   Technical Notes for details.
User Guides (DMG)
These documentation products describe usage of DM software, platforms, and data products to end-users. Typically, end-users are astronomers in the scientific community. Some user guides may instead be considered internal, such as operational guides or developer guides. The most important aspect of user guides is that they are written intentionally for their intended audience. See 9   User Guides for details.
Publications
Publications—namely journal articles, and conference presentations and proceedings—speak directly to the science community. With respect to the Data Management Documentation Architecture, publications should be a synthesis of DM documentation products (Design Documents, Implementation Technical Notes, Technical Notes, and User Guides). Publications are described in LPM-162: Project Publication Policy.

2.2   Information flow down

DM’s documentation taxonomy facilitates a flow of information from research and design, to implementation, and finally to operations.

_images/information_flow.svg

Figure 1 Idealized information flow across documentation classes.

As Figure 1 illustrates, the scope and functionality of the Data Management System is specified by Requirements Documents. Design documents translate requirements into actionable designs and documentation of system implementations. Designs are reflected in change controlled design documents (LDM), though details can be deferred to technical notes (DMTN, SQR). User guides are written for end users using a combination of information from the design documentation and the implemented software itself. Verification documentation is written as a consequence of testing activities. Finally, scientific publications are written as a holistic synthesis of the entire Data Management System for the community.

Note that this is an idealized linear information flow. Software development work will spur new technical notes that in turn create revise design documentation. However, Figure 1 shows the role of each document class in supporting the Data Management System in reporting research, documenting designs, and documenting for users.

3   Who Writes the Docs

Technical documentation (see 2   Documentation Taxonomy) is a responsibility shared across all of Data Management. Every engineer, manager and scientist ensures that their ideas and creations are written down so that other team members can understand and contribute to them. Managers budget time for engineers to create documentation, not just implementations.

Docs or it didn’t happen.

3.1   Role of the DM Documentation Engineer

DM members are not alone when documenting their work. The DM Documentation Engineer is responsible for leading the culture, establishing the practices, and building the infrastructure for technical documentation. The Documentation Engineer is part of the Data Management Science Quality and Reliability Engineering (SQuaRE) team under WBS 02C10.5.3.

While the Documentation Engineer writes documentation (for their own work, and when needed for important documentation on behalf of other DM teams), the Documentation Engineer does not by default write all DM documentation content. Instead, the default is for DM subject matter experts—engineers, managers and scientists—to document within their areas of expertise. This policy acknowledges DM as an agile organization where a technical writing team simply cannot scale with the engineering effort. It is more efficient for an engineer to write about what they make while they make it, and for developers to be allowed time to nurture the important skill of producing well-documented software. Since documentation is continuously delivered (see 5   Publishing Platforms), there is opportunity to refine the quality of documentation content. The role of the Documentation Engineer is to empower the DM team to produce quality documentation and efficiently as possible.

The Documentation Engineer fosters DM documentation best practices in four ways:

  • By creating and maintaining platforms for publishing and discovering documentation, and ensuring that DM documentation is citeable by the scientific community (see 5   Publishing Platforms).
  • By creating and maintaining the tools that convert content into consistently presented websites and artifacts (see 6   Documentation Formats and Generators).
  • By helping authors create content that fits into an effective DM-wide information architecture through templates and content strategy.
  • By helping authors create documentation with a consistent voice through a style guide and developer education.

3.2   Documentation Working Groups

Some documents, and document types, have special working groups that oversee their production. The organization of these working groups is discussed in the individual sections that describe document classes in the DM Documentation Architecture. Whether explicitly stated or not, the Documentation Engineer is available to support all DM documentation projects.

4   Content Licensing

Like Data Management source code, DM technical documentation is free and open and we welcome contributions to it from both inside and outside the LSST Project.

All DM documentation content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license enables our users to freely share and adapt documentation content. (For example, a professor can adapt DM documentation into course material.) The ‘attribution’ clause ensures that LSST is acknowledged.

All DM metadata is licensed under a Creative Commons Zeros (CC0) license. This enables bibliographic services to list DM documentation. (Archive and bibliographic services assume this. See the Zenodo terms of use, for example.)

Source code for DM documentation infrastructure is licensed under an appropriate open source license consistent with the DM open-source licensing policies.

5   Publishing Platforms

Three platforms facilitate publication and discovery of all DM documentation: LSST the Docs, DocuShare, and LSST DocHub. Publishing all documentation in this system ensures consistency in presentation and discovery. This section summarizes the functionality of these platforms for policies for their use.

5.1   LSST the Docs

LSST the Docs (LTD) is a platform for continuously publishing versioned documentation to the web. LTD is more fully described in SQR-006; please see that technote for details beyond the scope of this summary.

All DM documentation projects maintained in a version control system (specifically, Git with GitHub) are published with LTD. Other documents are published with DocuShare, see below.

5.1.1   LSST the Docs’s key features

The following LTD features are key to the DM Documentation Architecture:

  • LTD is a service that hosts static websites, that is, websites that do not require server-side rendering logic. Supporting only static websites ensures that LTD is reliable and scalable. This design decision also coerces documentation into a format that can be readily archived, which is important to DM’s scientific legacy.
  • Each documentation project hosted by LTD is published from a unique subdomain of the lsst.io domain. This helps to make human-friendly URLs. Audiences should see the lsst.io domain as a brand synonymous with LSST documentation.
  • LTD publishes multiple versions of a document, each under a different URL path prefix. This feature integrates well with DM’s Git-based development workflow where new documentation can be drafted on a branch. The DM team can review the rendered draft documentation without interfering with the production edition of the documentation. This feature also allows multiple versions of a document to be published to users, for example, to support multiple releases of a software product.
  • LTD integrates well with Git repositories and continuous integration (CI) services. CI allows the final documentation product to be rendered automatically by software from relatively simple markup. As well, documentation generation can be coupled with science pipelines to enable automated and reproducible reporting in CI. For example, a Jupyter notebook can be run and published to LTD in the same CI job.

5.1.2   Requirements for compatibility with LSST the Docs

LTD imposes a few implementation constraints on DM documentation projects:

  • The source for the published documentation must be hosted in a version controlled repository; typically this a Git repository hosted by GitHub. Documentation not hosted in GitHub is published directly from DocuShare, see below.
  • Each documentation project must be configured with a CI environment that builds and submits documentation whenever the underlying document source changes. Similarly, each documentation project requires software to transform documentation source into a static website. The SQuaRE team provides CI infrastructure and software to publish documentation in common source formats through LTD (see 6   Documentation Formats and Generators).

5.2   LSST Project DocuShare

DocuShare is the LSST Project’s official document repository (see LPM-51: Document Management Plan).

All change controlled design documents (with LDM handles) are deposited in DocuShare once approved by the Change Control Board (CCB; see LPM-19: Change Control Process), even if they are otherwise published through LSST the Docs. The LSST Project considers the version in DocuShare as the official version of a document that reflects a technical, schedule and budget baseline. LSST DocHub links to a document published in both LSST the Docs and DocuShare, with DocuShare being denoted as the baselined version.

Although DocuShare will be used in conjunction with LSST the Docs, some documentation formats are not managed in a version control system (Git) and thus are not publishable through LSST the Docs. These include documents from ‘office’ suites, such as Apple Pages, Google Docs, Dropbox Paper, Microsoft Word and Excel. For such documents, DocuShare is the only publishing platform: a document only in Google Docs, for example, is not yet considered delivered.

5.3   LSST DocHub

LSST DocHub is a platform for discovering LSST documentation and other digital resources. DocHub is both a metadata database and API, and also a set of user-facing web applications for searching and browsing LSST documentation. All DM documentation is available through DocHub.

Note

DocHub’s design in ongoing and will be presented in upcoming technotes.

5.3.1   Requirements for compatibility with DocHub

To be listed in LSST DocHub, projects must be registered with DocHub. Documentation projects hosted on GitHub have a DocHub-compatible metadata file residing in the project’s Git repository. Metadata is mirrored between DocHub’s database and the metadata file stored in Git. Documentation projects published exclusively though DocuShare are also registered in DocHub, though their metadata is editable through a web interface instead of a file.

Note

The DocHub metadata format will be specified in an upcoming technote.

5.4   Citeable documentation

LSST Data Management documentation is considered on par with scientific literature. Being close to the implementation, continuously tested, and written by the collective team, DM’s technical documentation is the most accurate and scientifically useful reference for detailed aspects of the Data Management System. To integrate with scientific literature, DM technical documentation is citeable according to the expectations of the astronomy community. This section describes how DM documentation is made citeable through Digital Object Identifiers and registration with the NASA Astrophysics Data System.

5.4.1   Digital Object Identifiers

Digital Object Identifiers (DOIs) are a standard for identifying digital artifacts. A DOI is a universal identifier that can be resolved into a document’s URL. The resolved URL can even be changed if the resources home on the web changes. Thus a DOI acts as a permanent link to the artifact (in this case, a document). In science, DataCite is a common DOI provider. Although LSST could become a DataCite member and provision DOIs through DataCite’s API, institutions like LSST typically cannot guarantee the data longevity that is expected for DOIs. Instead, science archives can permanently archive a copy of a document and provision a DOI through DataCite to that archived copy. Zenodo is an example of such an archive operated by CERN for the science community.

As part of the continuous delivery process, DM documentation platforms automatically submit new or revised documents to a data archive and receive a DOI. For multi-page documentation websites, each webpage is individually archived and given a DOI to prevent ambiguity in citations. Published documents display this DOI as part of their citation instructions to readers.

Note that DOIs provisioned this way resolve to the data archive’s landing page rather than the website published on LSST the Docs. While this does ensure the long term integrity of LSST documentation in scientific literature, it does compromise the present-day usability of DOI-cited LSST documents. To work around this, metadata published on the data archive landing page includes a pointer to the live document published on LSST the Docs.

Archives, like Zenodo, provide discovery services in addition to storing resources and provisioning DOIs. While this is a nice feature, the DM Documentation Architecture does not rely upon the discovery tools of specific archives. Instead, DocHub is our in-house fully-fledge document discovery platform for LSST DM. DocHub affords DM flexibility and specialization in organizing and presenting documentation, and also insulates LSST from a specific archive. Through DOIs, DocHub points to documents in archives, in addition to LSST the Docs.

5.4.2   NASA/SAO Astrophysics Data System

ADS is how the astronomical community discovers literature. ADS is not a document hosting service, but rather a metadata and search service. ADS lists LSST technical documentation with record pages that include bibliographic information and links pointing to the published documentation on LSST the Docs. The DM documentation platforms automatically submit new and updated DM documentation to ADS as part of the regular continuous delivery process. Specifically, the documentation platforms cross-walk metadata already available through LSST DocHub into the ADS submission schema (ADS Tagged Format).

6   Documentation Formats and Generators

This section describes the tools used to build static documentation sites that are published with LSST the Docs and DocuShare.

6.1   Sphinx

Sphinx is the first-class documentation generator for LSST Data Management. All user guides are produced with Sphinx. Although not required, other documentation classes should be preferably produced with Sphinx as well.

Some of the reasons Sphinx was chosen as DM’s documentation generator include:

  • Sphinx and reStructuredText are implemented in Python, which matches the DM technology stack.
  • Since reStructuredText is plain text, Sphinx projects integrate well with Data Management’s Git-based development workflow.
  • Through Python APIs, both Sphinx and the reStructuredText markup language are thoroughly extensible, giving the Documentation Engineer opportunity to build solutions that both improve developer efficiency, and improve the quality of documentation. For example, Sphinx is able to introspect Python code to build API reference documentation.

The DM Documentation Architecture uses two types of Sphinx projects, depending on the document class: single-page projects for narrative documents and multi-page projects for user guides.

6.1.1   Single-page Sphinx projects

6.1.1.1   Narrative-based information architecture

The single-page website is an excellent format for narratives that coherently explore a single topic or idea. By adopting the form of an academic article organized linearly by section headers, a single-page website is readily citeable (a single URL, and thus a single DOI) and archivable (a page can be transformed into a self-contained artifact like a PDF). At the same time, a single page website is a website and benefits from native hyperlinks, exposure to search engines, and a visual presentation that adapts to the reader’s context (responsive design).

6.1.1.2   Implementation

Single-page Sphinx projects implement the idea of a single-page website. Authors create new single-page Sphinx projects from a configurable template. Each single-page Sphinx project is maintained in individual Git repositories and configured to publish automatically to LSST the Docs.

Much of the design and logic for the format is centrally managed in separately from content in Git repositories maintained by the Documentation Engineer. This makes single-page Sphinx projects manageable and consistent in form and function. This technology stack is also shared with multi-page Sphinx projects, which are employed by user guide projects.

The single-page Sphinx format is recommended for technotes and change-controlled design documentation. Alternatively, these document classes can be published through the landing page framework. Authoring and publishing through the single-page Sphinx format, however, creates a better reading experience due to features like deep hyperlinking to individual content objects and responsive design.

6.1.1.3   Archival and citation workflow

In relation to the LSST Document Management Plan, single-page Sphinx projects are DM’s equivalent to the Project’s Document-9224 document template. LSST the Docs continuously publishes DM single-page Sphinx projects as websites. When a single-page Sphinx project is delivered (by merging to the master Git branch), the documentation infrastructure creates a PDF version of the document that matches the form of Document-9224 as much as is feasible. This PDF is deposited in DocuShare per LPM-51. Simultaneously, the PDF is also delivered to a science data archive to obtain a citeable DOI.

6.1.2   Sphinx for user guides

6.1.2.1   Topic-based information architecture

The purpose of a user guide is to introduce users to a product, teach users how to use a product, and be a reliable reference for every relevant feature and behavior in a product. As such, user guides are a constellation of marketing material, tutorials, conceptual guides, and references, as appropriate. This type of documentation is markedly different from the narrative documentation that is supported by the single-page Sphinx format (and the landing page framework). User guides must be implemented as multi-page websites, where each page covers a different topic type.

Every Page is Page One [1] is our guiding information architecture for documentation projects that DM implements in our user documentation. In an Every Page is Page One (EPPO) architecture, every page of documentation is a self-contained topic. Topics link to each other based on subject affinities to form a bottom-up information architecture (as opposed to a strictly top-down hierarchy that is established by narratives like single-page Sphinx projects and other report-like documents). The EPPO architecture acknowledges that users will create their own curriculum for learning a product, and that a linear hierarchy is not well-suited for this.

EPPO also benefits DM documentation development and maintenance. Each documentation page is self-contained, making documentation work easier to plan and schedule. Interlinked, self-contained pages also naturally reduce content duplication and ease maintenance.

[1]Baker, Mark (2013). Every Page is Page One: Topic-Based Writing for Technical Communication and the Web. Laguna Hills: XML Press.

6.1.2.2   Implementation

Documentation in the EPPO-type information architecture exists natively on the web. The multi-page Sphinx format is how DM implements all user documentation, without exception. Each user guide project is embedded in the code repository of the product it documents. In conjunction with LSST the Docs continuous versioned documentation delivery, this arrangement ensures that documentation is always versioned in step with the product. Indeed, API reference documentation is typically extracted from the code itself. Keeping documentation close to the code also improves the workflow of engineers who contribute documentation.

All multi-page Sphinx projects share common infrastructure to maintain consistency in form and function. This infrastructure is also shared with single-page Sphinx projects.

6.1.2.3   Runnable content

Examples and tutorials in user guides are engineered to be tested as part of the product’s continuous integration. This ensures that documentation and implementation are kept in sync. Tutorials are integrated in a way that allows the user to easily run and remix example code. This may be done with technologies like Jupyter notebooks and the LSST science user interface itself.

6.1.2.4   Citeable content

Since user documentation is the most detailed documentation of implemented DM products (thanks to its proximity to the code), user documentation is likely the most useful scientific reference. As described above, user guides are implemented as assemblies of self-contained topics. The individual topic (a page at a single URL) is therefore the most precise citeable entity. Citations to a user guide, in general, do not help a reader find the relevant information.

To facilitate topic-level citation, individual pages of multi-page Sphinx sites are archived independently. Each page is rendered into a self-contained PDF (single-page Sphinx sites) and deposited in a science data archive to be granted a DOI. Each page, as published on LSST the Docs, displays its DOI with citation instructions for researchers.

DM documentation infrastructure automates the workflow described above. Since it is an expensive workflow, a multi-page Sphinx site is only archived as part of a merge to the documentation’s master branch (and designated maintenance branches for releases).

6.2   Landing Pages for Alternative Formats

Although Sphinx is the preferred DM documentation format, not all Git-backed documentation is produced as a Sphinx project. Some documents are written in LaTeX for legacy reasons or to be compatible with scientific publishers. Jupyter notebooks are also being used for producing documents that are tightly integrated with code and data.

Since they are managed in Git, these document formats are eligible for being published as static websites with LSST the Docs. However, LaTeX documents, Jupyter notebooks, and similar formats, do not necessarily create polished websites that have the look and feel of LSST documentation. Thus the DM Documentation Architecture shims these formats through a landing page framework.

6.2.1   The Landing Page framework

Landing pages are static websites published with LSST the Docs, and indexed by DocHub. Irrespective of the original authoring tool, landing pages provide a consistent experience for consuming documentation.

Each landing page presents metadata to the reader, like title, authorship, summary, and links back to DocHub and related publications. Alongside this metadata, the landing page presents the document either as a list of links to other pages or files, or the document itself as an on-page iframe to a PDF.[2]

[2]The concept of displaying a PDF in an iframe alongside metadata on a static site is based on the gh-publisher project by Ewan Mellor.

Landing pages are hosted as GitHub repositories that contains and versions the document’s content and metadata. Similar to Sphinx-based documents, a continuous integration service, like Travis or Jenkins, publishes the landing page to LSST the Docs <platforms-ltd> whenever the Git repository is updated. Automations also make provisioning landing pages efficient.

The landing page generator, page design, and automations are provided by the SQuaRE team.

6.2.2   Workflows for specific formats

This section describes workflows for publishing common document formats through the landing page framework.

Note

This section will be moved to a documentation user guide; likely in https://developer.lsst.io.

6.2.2.1   LaTeX documents

LaTeX documents, being plain text, are hosted and authored entirely on GitHub. This GitHub repository is named after the document’s handle, and also hosts DocHub metadata and continuous integration configuration.

The continuous integration service renders the LaTeX source into a PDF that is displayed on the landing page.

6.2.2.2   Jupyter notebooks

Being JSON-based, Jupyter notebooks are natively hosted in a GitHub repository. This repository is named after the document’s handle, and also hosts DocHub metadata and continuous integration configuration.

The continuous integration service runs the notebooks themselves. This ensures that the notebooks are reproducible, and not tied to an individual developer’s environment.

The landing page contains metadata about the notebooks, along with a summary description, and a table of contents linking to individual notebooks. If there is only a single notebook, that notebook can be displayed on the landing page itself.

6.3   Formats not Managed in Git

All formats previously in this section are published with LSST the Docs, as they are managed in a version control system (specifically, Git with GitHub). This section describes policies for formats not publishable with LSST the Docs.

6.3.1   Office documents

Office documents are those produced by office and word processing suites, either in native applications (such as Microsoft Word and Excel, and Apple Pages) or in the cloud (such as Google Docs and Dropbox Paper). These formats may be used for change-controlled (LDM) documents. Note that such documents are only published though DocuShare. In DocuShare, both a PDF rendering and an editable version is included in the document’s stack.

Authors can register new office documents with DocHub so that their existence is known to the DM team, even before being officially delivered to DocuShare. DocHub can link to the document’s read-only preview if available from the cloud application. However, note that such draft documents are not stored by the LSST Project, and thus are not considered to be delivered. For example, a JIRA ticket may not be closed if it merely links to a Google Docs page, or an attached Word file, rather than a DocuShare deposition.

6.3.2   Confluence pages

Some authors may choose to draft documents in LSST’s Confluence wiki to take advantage of its commenting features and online editing. A Confluence page is not considered a delivered document, however. The authors must convert the wiki page into a format accepted for the document’s class. A technical note must be converted into a single-page Sphinx project, and change controlled documents may be converted into either single-page Sphinx projects or office documents. Once converted, authors must delete the original wiki page.

While the documented is being drafted, authors can register the document with DocHub to make it discoverable by the DM team. The document is only considered delivered, however, once the Confluence page has been converted and published in either DocuShare or LSST the Docs. A JIRA ticket, for example, may not be closed with a link to a Confluence page as evidence of documentation.

7   Change-Controlled Design Documents

This section describes LSST Project change-controlled design documentation in the context of the Data Management Documentation Architecture.

7.1   Background

Change-controlled design documentation defines the scope and budget of the Data Management System. DM’s change-controlled design documents carry “LDM” handles. New and revised LDM documents are reviewed by the DM Technical Control Team (TCT), as described in LDM-294: Data Management Organization and Charter, and submitted to the LSST Project Change Control Board for approval. Overall, the role and policy surrounding change-controlled documentation is described in LPM-19: Change Control Process.

7.2   Maintenance

Design documents must always be consistent with the system’s implementation since they formally define the Data Management System’s scope and budget. If an implementation exceeds the envelope of the design document, a ticket should be raised for the project management team to review and reconcile the design and implementation.

7.3   Technical Implementation

Per LPM-51, all change-controlled documentation is deposited in DocuShare upon acceptance by the Change Control Board.

Some change-controlled documents are only available through DocuShare, such as word-processing files and spreadsheets, and all classified (non-public) documents. In these cases, DocHub indexes and links to the document from DocuShare. These documents are not be mirrored on LSST the Docs.

DM members are encouraged to author documents in formats that can be version-controlled in Git and published by LSST the Docs for improved collaboration. Single-page Sphinx projects are preferred, and LaTeX documents published with the Landing Page Framework are allowed secondarily. The status of these Git-hosted documents in the change control process is reflected in its Git branches (and hence editions on LSST the Docs). The master branch (and main LSST the Docs edition) is reserved for change-controlled versions of a document. Ticket and integration branches allow intermediate drafts of the document to be shared. When a document is accepted by the Change Control Board, that version is simultaneously merged to master and archived as a PDF to DocuShare.

To facilitate citation in scientific literature, the accepted versions of un-classified documents are also deposited in a science data archive (such as Zenodo) to be granted a DOI.

LSST DocHub is aware of each version of a document (as published on LSST the Docs, on DocuShare, and Zenodo) and usefully directs a user to the relevant version and citation information.

8   Technical Notes

This section describes technical notes (technotes), which are flexible, self-contained documents.

Note

This section supersedes SQR-000: The LSST DM Technical Note Platform.

8.1   Role

Technotes are containers for ideas, rather than documentation for products. This distinguishes technotes from user guides.

Technotes can be used to describe results from an experiment (like a scientific paper). They can also be used to investigate technologies or design decisions and suggest a plan. Technotes are primarily a product for DM, not a product by DM. Through technotes, DM captures its research effort to make better implementation choices. In Figure 1, technotes are shown as inputs to design documentation in DM’s idealized information flow.

Another use for technotes is to provide high level overviews of shipped products, pitched primarily to external audiences. These narratives are like the blog posts that tech companies write about their products, and are opportunities to explain the philosophy and design reasoning of DM products. Often the tone of such narratives does not fit in user documentation, but works well in a technote article.

Overall, technotes are catch-all containers for narratives that are not design documentation nor user guides.

Technotes are created and published on-demand by DM team members. There is no approval process for technotes other than the standard DM code review for ticketed work. Through this lack of administrative process, and infrastructural automations (see below), the bar for creating a technote is intentionally low. This is important since capturing more information in a standardized technote format increases the proportion of DM knowledge that is accessible through DocHub.

8.2   Technical Implementation

Single-page Sphinx projects are the recommended format for technotes. This format publishes natively to the web, making these documents highly useable in browsing and information hunting contexts. The single-page format is also straightforward to reduce into static, archived documents (like PDFs) that can be cited in scientific literature. The single page format also emulates scientific papers, and is appropriate for a narrative format.

Where it is more convenient to the author, technotes can also be published through alternative formats (such as LaTeX articles and Jupyter Notebook collections) using the Landing Page Framework. Nonetheless, Sphinx projects are recommended over LaTeX projects for a better online reading experience.

8.3   Authorship

Technotes have explicit author lists, similar to academic papers. This property raises the profile of individual intellectual contributions, which is important for DM team members who are writing fewer academic papers, yet want to maintain a CV. Associating individuals with technote authorship also builds trust in the content: technote authors are subject matter experts sharing knowledge with DM.

Because technote authors are intellectually responsible for their content, DM team members should always involve the original authors when making a pull request against a technote that is not theirs.

8.4   Maintenance

Technotes do not need to be updated. Akin to a traditional paper publication, a technote implicitly represents a state of knowledge at the time of publication. Creating a technote does not oblige the authors to continually re-assess and update the document.

When significant changes occur, authors can add metadata and other notices. For example, if content from a technote has been migrated into a different document (upstream to a design document, or downstream to a user guide), the technote can be marked as deprecated.

A technote can be updated freely by the authors. If new information needs to be documented that fits an existing technotes’ scope, that existing technote can be updated rather than create an entirely new document. This approach reduces the amount of legacy documentation. Archival tools and the LSST the Docs publishing platform maintain older versions of technotes for historical interest.

9   User Guides

This section describes what DM user guides are, and how they are produced by user guide working groups.

9.1   Role

User guides empower the people who use DM’s software, platforms, and data products. A user guide can contain reference information, conceptual narratives, task guidance, tutorials, and even marketing information. In many cases, a user guide can be the primary web-facing manifestation of a DM product.

User guides can be aimed at internal or external audiences, or both simultaneously. The DM Developer Guide, and back-end operations guides, are examples of internal guides. User guides, of course can also be public facing: documentation of software products, web platforms, and data products. Public documentation also has tremendous value to the team itself. A good example of this is documentation for open source software where the information needs of a consumer are nearly the same as a developer or contributor.

Unlike design documentation, user guides are tightly integrated with a DM product’s implementation and delivery. Consequently, user guides are the most definitive references for what DM products are and how they work. Their content will be used by scientists to create science. These aspects make DM guides ideal primary references for scientific literature. We enable this usage by making individual pages of guides citeable entities (see 6.1.2.4   Citeable content).

9.2   Technical Implementation

All DM user guides are created as multi-page Sphinx projects. This format facilitates a topic-based information architecture needed for comprehensive documentation projects.

Given that astronomers will use multiple DM products simultaneously (for example: Pipeline software, DataSpace, and data release), mandating the multi-page Sphinx format unifies the user experience across LSST’s astronomer-facing documentation.

9.3   Organization

Every DM software, platform and data product has a corresponding user guide.

Rather than a monolithic DM documentation site, user documentation is partitioned according to the codebase or data product that each guide documents. This approach keeps documentation close to the source so that documentation is versioned and delivered in step with the product itself.

The organization of user guides reflects, and even shapes, the perception of a DM product. A project may have multiple user guides if the user audience is highly disjoint (for example a project may have an internal operations guide that is separate from a guide for consumers). Multiple audiences can share a common user guide if their interests are similar. This is common for open source APIs, like pipelines.lsst.io. Finally, multiple source repositories may share a common user guide project. For example, the Science Pipelines consist of many Git repositories, yet are commonly documented at pipelines.lsst.io since we want to portray the Pipelines as a coherent entity.

Because user guides are websites, links allow independently produced user guides to still act like a consistent, monolithic resource. Instead of duplicating or rephrasing content from a different guide, user guides always link to the most authoritative page.

9.3.1   Relation between users guides and design documentation

Design documentation and user guides both refer to the same thing: a DM product. Depending on the user guide’s intended audience, design documentation may flow and be transformed into user guide content. For example, the Science Pipelines user guide is an open source software project where users are also developers. Successfully using and extending the LSST Science Pipelines may require documentation of implementation details originally described in design documentation. Rather than maintain a parallel set of overlapping design and user documentation, in this case the design documents should be deprecated in favor of the user guide.

9.4   User Guide Working Groups

User guides are DM products, much like the software, platforms and datasets they accompany. To oversee their delivery, each user guide project is managed by a corresponding working group.

User guide working groups are a mixture of subject matter experts (SMEs; usually the engineers, scientists, or managers directly involved in building a product) with the Documentation Engineer who provides expertise in infrastructure, technical writing, and information architecture. These working groups are structured to solve key management challenges:

  • User guides need strong cohesion and editorial curation, especially in their early development. A working group can provide consistent vision to a user guide project.
  • DM products are highly technical, and their guides cannot be effectively produced by a separate technical writing team (see 3   Who Writes the Docs). Subject matter experts (SMEs) must be involved in producing documentation for the products they build. SMEs in the working group ensure that the user guide is technically appropriate and complete. SMEs in the working group can also liaison with the rest of the engineering team to manage documentation contributions.
  • Individual user guides exist within a larger environment of LSST documentation. The Documentation Engineer embedded in the working group develops the common infrastructure necessary for producing and publishing content. The Documentation Engineer also contributes knowledge in areas of information architecture and technical writing. Being shared across user guide working groups, the Documentation Engineer provides consistency and quality to the LSST documentation experience.

Each user guide working group’s composition will be unique and tuned to the project’s needs. However, the following roles should be filled, possibly by the same or multiple people:

  • Subject matter expert who leads the curriculum development of the user guide.
  • T/CAM who is able to schedule effort for all engineers that may need to contribute documentation content.
  • The Documentation Engineer who provides documentation infrastructure and provides advice on content (technical writing) and organization (information architecture).

Again, user guide working groups play a leadership role in documentation delivery. The engineering and scientific teams who build a project will be responsible for producing most of a user guide’s content, especially reference content. The Documentation Engineer will also contribute critical (highly used) and complex content pieces.

9.5   Maintenance

User guides are continuously delivered in step with product development.

As APIs change or are added, software developers must update the corresponding reference documentation. This process is convenient for developers since reference documentation is typically extracted from source code itself. Reference documentation writing is expected to be part of all software development tickets.

Tutorial and conceptual documentation is more expensive to produce than reference documentation, and is typically written in tickets separate from software development. API changes may break conceptual or tutorial documentation. Where possible, the software development ticket’s scope should including fixing incompatibilities in the documentation. Where the changes are too numerous, the outdated documentation should still be identified and excluded from documentation builds, and a follow-up documentation ticket should created and scheduled.

9.5.1   User Guides and Community.lsst.org

Community.lsst.org is DM’s primary long-form communication venue, both internally and with end-users. Through conversation, original knowledge is naturally published on Community.lsst.org. Thanks to its open nature and search capabilities, Community.lsst.org can serve as an emergent knowledge base for LSST.

However, Community.lsst.org should not surpass any user guide as a primary source of information. User guide working groups should monitor Community forum conversations. When a question on the Community forum cannot be answered by the user guide, the working group should seek to distill the conversation’s information into the user guide. Once the new user guide is updated, the working group should post a reply to the Community topic that links to the new content in the user guide. This helps future readers find user guide content through the Community forum.