Why Master Documents corrupt

Article contributed by John McGhie

Update: Word MVP Steve Hudson has an article describing how to use master documents safely here.

The complete explanation would be a book in itself. For now, it is enough to know that a Word document is a great big list of objects. An object can be anything you can put in a Word document. Each of these objects has many, many properties that determine how it appears and how it behaves.

The properties are all contained in several giant tables inside the file. The connection between any given object (say, a paragraph) and its properties is made with an amazingly complex lattice-work of pointers. These pointers are large binary numbers that cause Word to look at an exact byte location in the file to see what shape, size, or colour this object should be. Most objects have more than one pointer. Some pointers go to collections of properties (for example, a List Template that describes all the formatting for a numbered list) and some go simply to a single entry (for example the language that is just a single name).

Whenever we experience a Word document corruption, what has actually happened is that the pointer, or the entry in the table it points to, has become corrupted. The information found there is either nonsense, or it does not apply to the object in question. For example, a paragraph is trying to inherit page margins: a paragraph cannot have page margins, so Word gets terribly confused.

All these property tables are stored in Section Breaks. A Section Break is not just a page break, it is a binary container that stores several hundred properties in multiple tables. The largest Section Break is the Default Section Break. You will never see one. The default Section Break hides in the very last paragraph mark of a document. Because it is absolutely essential to the document (without it, the file is just a stream of bytes, not a document) Word maintains the contents itself and hides it from you and me.

The reason that Master Documents cause so much trouble is that you are asking Word to merge together many hundreds of different settings, some of which conflict, some of which apply only to one or a few paragraphs. A typical master document may contain 20 sub-documents. This means there are 21 default Section Breaks, each containing potentially-conflicting properties. Each subdocument also can contain multiple user Section Breaks. These may or may not override or conflict with the settings in one or more of the default Section Breaks.

If a property is specified, does it apply to this document? Some of this document? Several of these documents? And is the document that stores it open? Is it active? Read-only or editable? The number of possibilities rapidly expands, geometrically, until the structure simply becomes too complex. Word loses track of what it is trying to do. And takes a guess. The guess overwrites something: and Bingo! You lose your master document.

When we say you lose your master document, this loss can take many forms. You wouldn't be reading this at all if you had not so far experienced one of the lesser forms. You can still read some of your text, right? Trust me, it can get worse! The ultimate master document corruption results in some or all of the text paragraphs disappearing. Once this happens, there is no way to get them back: they are no longer in the file. Which can be very disconcerting if the corruption happened several weeks ago, and because you were not looking at that part of the document, you didn't find out about it until you came to print the whole thing, by which time you had long since over-written your backup!

A master document has only two possible states: Corrupt, or just about to be corrupt. And that is why we say that the only possible fix to a master document is don't use it!

For information on how to recover a Master Document, please see the article How to recover Master Documents .