[an error occurred while processing this directive]

 


Exploring 2000 -
Masters without slaves - Jan 1999
John Savill delves deep into the workings of multi-master replication. This article is based on NT 5.0, Beta 2.
.

In this article I will be looking at how the multi-master model works.

Making changes


With Windows NT, any change is passed through to the only primary domain controller of the domain. In the event of the primary domain controller being unavailable clients can continue to logon through a backup domain controller, however no changes can be made. In the event of the primary domain controller being unavailable for a length of time a backup domain controller can be promoted to take over the role of the primary. Backup controllers hold a copy of the primary domain controller’s database that enables it to authenticate client requests.

The backup domain controller databases are kept up-to-date through a process of replication that occurs on a regular schedule and has some limited tuning available. In a Windows 2000 domain with all domain controllers being equal, changes can be made on any domain controller and each servers complete domain directory has to be kept up-to-date with each other through a process of multi-master replication.

Implementation


Each Windows domain has one or more domain controller and each of these has a complete copy of the domains Active Directory. When a change is made to the Active Directory, the client application does not need to worry about which machine is a domain controller but rather uses the supplied interface to make the change while it is transparently completed at the nearest domain controller. The domain controllers merely act as interfaces into the Active Directory. Each time a change is made to the Active Directory the servers Update Sequence Number, or USN, where the change is implemented is incremented by one and this USN is also stored along with the change to the property of the object modified.

These changes have to be replicated to all domain controllers in the domain and the Update Sequence Number provides the key to the multi-master replication.

Each server has a single Update Sequence Number that is incremented with each modification. This is not just for originating changes where the original change is updated on the current domain controller, but also for any replicated changes replicated from any other domain controllers.

These Update Sequence Number increments are atomic in operation which means that the increment to the USN and the actual change occurs simultaneously. If one part fails, the whole change fails - meaning it’s not possible for a change to be made without the USN being incremented. As a result, changes will never be lost. Each domain controller keeps track of the highest USN’s of the other domain controllers that it replicates so it can calculate the changes it needs to replicate on each replication cycle. The time between replication cycles is set using the Active Directory Sites and Services MMC snap-in. Replication can be set on a per hour basis throughout the week to occur once, twice, four times, or not at all.

Update sequence number tables


At the start of the replication cycle each server checks its Update Sequence Number table and then queries the domain controllers it replicates with for their latest USN’s. For example, the table below represents the USN table for server A. In a previous article we looked at how changes are made between domain controllers in a single domain via Update Sequence Numbers and high-watermark vectors (and we will have a quick recap in the first section). In this article, I thought we would look deeper into replication and how domain controllers actually perform replication both within a site, between sites and between domains.

Domain Controller B Domain Controller C Domain Controller D
34 54 39

Server A now queries Server B, C and D and receives the following values:

Domain Controller B Domain Controller C Domain Controller D
36 57 39

From this, Domain Controller A can calculate the changes at each controller it is missing, namely:

Domain Controller B Domain Controller C Domain Controller D
35 and 36 55, 56 and 57 Up-to-date

It will then query each server for the changes that are needed for the specified property modification. This method means complex time stamps are not needed – However, domain controllers still need their times to be in synch with each other. This is accomplished using a time synchronisation service based on SNTP that is installed as part of the DCPROMO process which converts a server to a domain controller. It is possible for multiple changes to the same property of an object to occur, and collisions are detected as each property of every object has a Property Version Number (PVN) which work like USN’s. Each time a property is modified, the PVN is incremented by one.

In the event of a modification to the same property of the same object, then the change with the highest PVN takes precedence, and if the PVN’s are the same for a property update then a collision has occurred. If the PVN’s match, then the time stamp is used to resolve any conflicts. Each change is time stamped and this highlights the need for the domain controllers times to be in synch with each other. In the highly unlikely event that the PVN’s match and the time stamp is the same then a binary buffer comparison is carried out with the larger buffer size change taking precedence. Property Version Numbers are only incremented on original writes and not on replication writes (unlike USN's). Also, they are not server specific but rather travel with the property throughout its life.

Reducing traffic

To cut down on unnecessary network traffic the replication system employs a propagation-dampening scheme. This is needed as each domain controller may have multiple replication partners and so changes could go round in a loop. Each server also keeps a table of up-to-date vectors; these are the highest originating writes that are received from each controller. An originating write is the original update made to an object property that is not caused by replication, i.e. the number of original writes that each server has performed. An up-to-date vector is in the following form:

<change, e.g. password of john modified>,<domain controller making the original change, e.g. controller A>,<USN of the change>

Example: the user updates their password on domain controller A. This change then gets replicated to domain controller B. Controller B currently has the following table:

  Controller A Controller C
USN 23 45
Up-to-date vector 12 3

Controller A will give the password change its next USN which will be 24. Controller B writes the change to its copy of the active directory, increments its own USN and updates controller A’s table with the latest USN and up-to-date vector.

  Controller A Controller C
USN 24 45
Up-to-date vector 13 3

The change also gets replicated to controller C, controller C then writes it to its the active directory and updates its USN from 45 to 46 but does not update its up-to-date vector as it is not an original change. The next time controller C sends out changes it sends out USN 46, controller B has USN 45 for controller C so it requests change 46. Controller C also sends its up-to-date vector list that includes the controller A change to controller C in the format:

<password change>, <Controller A>, <USN 24>

As B already has this change as it has had USN 24 from controller A, C does not send the password update. On controller B the USN for C is still updated to 46 as the replication cycle is complete. This multi-master replication means there is no single point of failure. In the event of a server failing when it is brought back on-line, all changes since will be copied thanks to its record of Update Sequence Numbers. Of course, if its stays off line all changes made that were not replicated will be lost. This may cause some confusion as the changes that have "mysteriously" been lost will hopefully have been replicated to another server, and will then in-turn replicate the changes to the other controllers.

I hope this has clarified the multi-master replication algorithm. These protections in the form of USN's and PVN's make this new method far better than the old single point of failure approach.