This guest blog post was written by Neil Perlin, an internationally known consultant, strategist, trainer, and developer for online content in all forms from traditional online help to mobile. 

I originally wrote this post in 2013 for the Beyond the Bleeding Edge column that I wrote for the Society for Technical Communication’s Intercom magazine. On reviewing it eight years later, I’m struck by how much of it has stood the test of time and technical change.

Is your information architecture effort starting from scratch? Count your blessings. Too often, we have to re-architect (aka “clean up”) legacy content to make it work with:

  • New outputs, like converting PDF to a browser-based online help format.
  • New environments, like converting tables in browser-based online help to display attractively on different types and sizes of mobile devices.
  • New standards, like converting HTML, with its barely enforced syntax rules, to XHTML, with “real” rules.
  • New tools, which may not support some of the output formats of your older tool or trip over some of the codes from the older tool.

These new issues may not have been considered when the legacy content was created. Some of them may not have even existed. So the legacy content isn’t in suitable shape for going forward – e.g. it’s a mess. But you need it. What do you do?

In this post, I’ll discuss eleven tasks common to every legacy content re-architecting project I’ve worked on, ranging from those with 1,000 topics to one with over 150,000. Some of them are common sense. Others may be unfamiliar if you’ve never upgraded legacy content. Much of the work means identifying and fixing not only errors but their sources in your processes and tools. Are these tasks on the bleeding edge? No. But they are the basis for getting your content to work on that bleeding edge–for future-proofing it. With that, to the tasks…

Identify Pain Points in the Documentation Process

You’ll learn about problems that may not be obvious from looking at the material. For example, one company had an in-house tool that supported conditionality, and the conditional tags were easy to see. What wasn’t easy to see was that the tool required authors to insert a condition’s start and end tags separately. Authors could easily insert a start tag but forget the end tag and wind up with broken conditions. So what appeared to be a coding problem was actually a tool problem easily fixed by adopting a better authoring tool, but the problem was only obvious to the authors.

Determine the History of Legacy Content

This can help you understand why the content is in the state it’s in and the problems to expect. For example, if the legacy content was created in RoboHelp 7 or earlier, you may find problems with lists due to differences in how lists are coded in HTML (used in RoboHelp 7 and earlier) and XHTML (in RoboHelp 8 onwards).

Use One Authoring Tool to Eliminate File Incompatibilities

Modern authoring solutions like MadCap Flare support XHTML, so a file created using tool A should import cleanly into tool B. The problem is that different tools also add proprietary features that other tools may not recognize. The best outcome, in this case, is that tool B will ignore any proprietary features from tool A. Second best is that tool B will ignore the proprietary features from tool A but display their code in the topic. The worst case is that tool B will trip over a tool A feature, forcing you to go into the code to remove that feature.

Re-examine the Structure of the Project to See if What Made Sense in the Past Still Does

This may help you simplify a project. For example, one client set up a project with a structure that was so granular that a field definition might constitute a topic. The idea had merit but the project grew to over 150,000 topics. The result was hard to manage and slowed some authoring tool processes to a crawl, which reduced authoring efficiency.

Get Rid of Extraneous Content

Can you get rid of material that users should already know in order to use your product? At one time, hardware and software manuals included a section on how to use the mouse. That section is long gone because users (probably) know how to use a mouse before arriving at your product. Is there some equivalent in your content? A few years ago, I had a client that made accounting software. The help explained how to use the software and the underlying concepts in about 2000 pages. Users complained that it was impossible to find specific details in that mountain of content. The client decided to remove the conceptual material because “if users don’t know this stuff, they shouldn’t be using our software at all”. The result was a much shorter help system that was easier to use and maintain.

Replace or Eliminate Non-standard Code

Non-standard or bad code is ugly, but it can also cause problems when using new authoring or conversion tools. One client had created content using a proprietary authoring tool that didn’t terminate IMG tags and that used an unusual character string as a processing marker. When I imported the topics into Flare and tried to open them in the WYSIWYG editor, the syntax was so non-standard that the topics wouldn’t open at all. I had to edit the topics in the code to fix the most egregious problems in order to open the topics in WYSIWYG. You’ll have to fix these problems sooner or later. Re-architecting legacy content is a good time.

Eliminate Special Effects

Eliminate those cool special effects that no longer are, and that are hard to maintain now that the original developer is long gone. Unless there’s a really good reason to keep them.

Stay Between the Lines

Use authoring tools correctly. Every authoring tool and markup language that I know of have “hacks” that you can use to make the tool or language do something that it wasn’t intended to do. Don’t use them and get rid of any you find. In every case in which they’ve been used, by me or other authors, they’ve blown back on the author at tool upgrade or tool change time.

Add Templates to Your Authoring Interface

The more templates you create, the more structure and consistency you add to the projects. For example, we often create master topics to use as models when multiple developers create task description topics. The problem with this approach is that we name the file “task topic template” and then have to remember to make a copy of the template file and change its file name to the real topic’s name. Invariably, we’ll forget and overwrite the template. Most modern tools offer some sort of template manager that lets you create a template and make it available through the tool’s own interface, eliminating the overwrite risk.

Look for New Ways to Fix Problems

This statement isn’t as self-evident as it sounds. Many modern tools have features that can fix your problems but you have to find those features. In Flare, for example, if you need to create both online and print outputs, you’d like your online hyperlinks to convert to a page reference format in print output – e.g. changing “click for information about widgets” to “see page 34 for information about widgets”. Trying to calculate the page numbers manually when generating print output is a terrible job, but the innocuously named cross-reference feature can do this automatically. But you have to find the feature. Check your tool vendor’s community for more ideas like this.

Think Flexibly and Extensibly

Inertia often plays a major role in project work. We’ve used a tool for years but still size our text in points out of habit. Or use head styles but still do local formatting for things like bolding and italicizing. Both practices work but may be shortsighted because they don’t take change into account. What if management suddenly dictated that all the help systems have to be ported to smartphones within the next quarter? Adapting the text styles for the little displays manually is a hard job.  The solution is to think flexibly by replacing points with % or em units for sizing and getting rid of local formatting by making fuller use of CSS and letting the browser do more of the work for you.


We rarely get a chance to go back and do it right. Re-architecting legacy content is one of those chances. It will let us prepare for the new formats and outputs to come, and increase the programmatic aspect of tech comm and thus increase the range of opportunities available to us.