There is no doubt that GitHub is a great resource, and I use it to host sample DITA files that I reference from It is a great place for storing code, and can include storing content written in DITA. One thing GitHub is not is a component content management system, nor is it a good substitute for one. I would argue that it is far from being the best repository for storing DITA XML content.


To Effectively Reuse Content You First Need to Find It

One of the chief advantages of DITA from an efficiency standpoint is the ability to reuse content. This can include reusing whole maps or topics, or be done at a more granular level using conrefs and keys. One of the key things (pardon the pun) of being able to reuse content effectively is first being able to find the content to reuse.

GitHub’s native search feature is abysmal from this perspective. It is good for finding filenames that are similar to what you are looking for, but that’s about it. From a process perspective that means you are forcing your writers to store all content locally to be able to effectively find content for reuse, and hoping for the best. The only effective process for a technical writer in this situation is to search for content locally. The larger the amount of content stored on GitHub, the less useful it is.

A good CCMS provides extensive search facilities for finding content for reuse. Not only can you search specifically for content selectively within maps, topics and images, you can refine searches for content within the active map you are working on, search on specific DITA tags or attributes, and so much more.

If your writers can’t find content, they can’t reuse it.

GitHub Does not Inherently Understand XML

The one thing that GitHub does well is store and version files. However, it does not have any additional “smarts” when it comes to working with or manipulating XML. There is no validation inherent to GitHub that ensures that any content submitted to it is well-formed, meaning that both good, valid DITA and malformed content are all the same to GitHub.

A good CCMS will not let you save invalid XML material to it, ensuring that it is well-formed and problem-free before a writer can submit it. While it is true that a good XML editor will point out invalid code, it will still let the technical writer save the content locally, on the assumption that what they have is a draft that will be fixed later. There’s nothing stopping that writer from uploading that content to GitHub. Things begin to get interesting the first time someone else downloads that “draft” and tries to use it in their output (which will fail).

The issues extend beyond writers trying to submit poorly-formed XML. If you do any sort of localization you want to ensure that the material coming back from the localization service provider is valid DITA. GitHub can’t check that for you, but a good CCMS can. An XML-aware repository like TEXTML (the foundation for MadCap IXIA CCMS) enables such things as validation checks, the ability for deep search for content to be effective, and the rapid manipulation of XML transforms to convert and produce output. Also, workflow.


GitHub + an XML Editor Has no Workflow

A keystone to any mature documentation process is workflow. This is the process of moving content from a draft status to review, then to done. In between you can shuttle topics to subject matter experts (SMEs) to review, to other writers to copyedit, and to gain approvals for accuracy and completeness prior to publication. Workflow is a standard feature built in to any good CCMS.

This is wholly absent from GitHub along with any XML editor.

While it is possible to jury-rig something akin to a workflow using a series of third-party programs working with GitHub, this involves programming work and customization to get what you need. Why re-invent the wheel when workflows already exist within a CCMS?


A Lack of Associated Metadata

A good DITA CCMS not only keeps track of topics, but also contains other associated metadata that is handy for better understanding how, when, and who authored the content. A good CCMS, for example, will include information on associated metadata fields covering everything from when a topic was created, to who last worked with it, whether it is actively being worked on or not, its version, associated keywords, and more. While some XML editors can include this type of information, it does not do so out-of-the-box and without significant configuration beforehand.

This metadata is critical not only to having an effective workflow, but also to search. For example, the metadata in the MadCap IXIA CCMS ensures that you do not have two or more writers working on the same content at once, with built-in workflow notifying other writers when a topic is “locked” and being worked on (but can still be viewed in its pre-edited state).

In terms of search, you can seek topics that were created on a certain date, or by a specific writer, or for an individual project. This metadata is also crucial for deriving production metrics, so that you can get an idea as to how many topics were created or modified over a period of time, to mention only one example.

Again, none of this comes with GitHub out-of-the-box.


What is GitHub good for from a documentation perspective?

What GitHub does best is what it was designed for: storing and versioning software content. When I first started using a version control system for holding my desktop publishing content (many, many moons ago), it was great to be able to check in updated content for other writers to work with, and to have the ability to revert to a previous version if things went askew in a later edit. Much of this has been replicated in the slick GitHub interface, and it is great for storing, versioning and, in my case, sharing files publicly.

I currently have some DITA projects I am sharing on GitHub via my DITAWriter account, and one of the ongoing issues I have with it is having to coordinate changes somewhat awkwardly via email with the people I am collaborating with. There have been cases where someone has over-written content that I have worked on, and vice versa. In short, there have been plenty of times when I wish I had been using a CCMS along with my collaborators. While the process of using an XML editor to resolve the differences between conflicting versions of a topic has gotten better, it is not a match for a CCMS that inherently knows how to handle branching and merging operations, especially when multiple products lines and versions of them are involved.

While GitHub is good at storing code, and XML editors are great at what they do, a solid CCMS that is XML-savvy provides a better overall authoring and publishing experience.