PHPKB Knowledge Base Software Logo  
Guru Corner
Online Knowledgebase System  
Knowledge Base Home Knowledge Base Home
Home > All Categories > Microsoft > Windows > Windows 2008 > Domain Controller/Active Directory > Configure your virtual Domain Controllers and avoid simple mistakes with resulting big problems
Question Title Configure your virtual Domain Controllers and avoid simple mistakes with resulting big problems
So You went ahead and used virtualized Domain Controllers for Your Active Directory domain, congratulations! I am sure You will be happy with the decision, as long as You have a decent virtualizing environment, this will give You both peace of mind, faster recovery and cheaper redundancy.

There is however some special considerations You must do, when You are using virtual Domain Controllers, not to mention, please with sugar on top, do NOT P2V/Convert Your physical Domain Controllers to virtual.

What areas do we need to consider on a virtual DC?

  • Time synchronization
  • Disk cache
  • Suspend/pausing virtual machine
  • Snapshots and System State backups
  • Performance

Personally I much prefer virtual Domain Controllers, from having a lot of physical ones, but there are some considerations to be made, about perhaps leaving some physical and what features to use on the virtual and what settings to use as well. This article attempts to uncover some of the points to consider, specifically for virtual DC’s. The list is in no way meant to be the only considerations, but is mostly the things that I personally have noticed forgotten in environments I have encountered. Add Your own preferences and research to this and You should be well on Your way to live happily forever with Your virtual DC’s.

Lets begin with Time Synchronization of Virtual Domain Controllers

Time in an Active Directory environment is paramount to all authentication and secure communication, for both Domain Controllers, servers and clients. In an Active Directory environment, kerberos is used to issue a ticket during login, this ticket is default valid for 8 hours, and prevents constant authentication on Domain Controllers, every time a user accesses resources. Instead the kerberos ticket is served and verified thru out the forest. However, the encryption and security between the client and the domain controller issuing the ticket, requires an exchange of passwords and setup of a secure channel. To prevent anyone from being able to listen on the network and reuse the packets of authentication from the client from before, all packets include a timestamp. If the timestamp coming from the client differs with more than default 5 minutes from the Domain Controllers time, it will discard the packet as fake.

The default maximum time difference allowed is only 5 minutes and is set in  ”Maximum tolerance for computer clock synchronization” Group Policy setting for the domain.

Because of this time synchronization between windows clients in an Active Directory environment is extremely important. A domain controller and client with times that are not the same, can prevent logon and access to network ressources.

All domain controllers, will by default have the time service (w32time) running and it will function both as a client for the DC it self and as a NTP server for domain servers and workstations to synchronize with. In a domain, all DC’s will automatically synchronize time with the Domain Controller that has the PDC FSMO role running. The Domain Controller with the PDC role should then be manually configured to sync it’s time with a good NTP source.

Why time synchronization fails when the Domain Controllers are virtual

Virtual machines, will by default have varying resources, cpu clocks, etc. On a busy system, they may even be denied resources for short periods of time or during high workloads, VMotion, Backups or may receive higher cpu resources than the operating system is even aware it could get. I.e. the operating system believes it has 1 cpu of 2.4 ghz, in reality it is running on a VMware server with 8 cores of 2.4ghz.

This results in something usually referred to as time drifting, the clock and the “ticks” it uses to keep time will sometimes run a faster or slower. Personally I have seen virtual servers misconfigured for time synchronization, that were off by several hours. Most time synchronization clients will have a limit on how much the time may differ from the NTP source and still synchronize. Some systems are set for no more than 15 minutes, 1 hour, 15 hours or even synchronize no matter the time difference, this may also prevent synchronizing because the time has drifted too much since last sync.

The time service (w32time) running on Windows Servers and Domain Controllers, will be well sufficient of keeping time very accurate on a physical machine, with default sync’s being done every 45 minutes until 3 successful sync’s, then every 8 hours. The time service on Domain Controllers, also functions as the time server for all clients in the domain, so do not just disable this service, if You do not need the client functionality!

So basically we need to ensure that our virtual Domain Controllers and especially our DC with the PDC FSMO role are always synchronized perfectly, otherwise we risk problems with authentication throughout the domain.

VMware tools time synchronization – important!

Be aware, that a VMware timekeeping document, describes a serious problem, that might make this solution very inappropriate.

However, at this writing, VMware Tools clock synchronization has a serious limitation: it cannot correct the guest clock if it gets ahead of real time.

This limitation applies only to periodic clock synchronization. VMware Tools does a one-shot correction of the virtual machine clock that may set it either backward or forward in two cases: when the VMware Tools daemon starts (normally while the guest operating system is booting), and when a user toggles the periodic clock synchronization feature from off to on.

If the clock on the guest falls behind the clock on the host, VMware Tools moves the clock on the guest forward to match the clock on the host. If the clock on the guest is ahead of that on the host, VMware Tools causes the clock on the guest to run more slowly until the clocks are synchronized.

Basically this means, before using VMware tools for synchronizing our Virtual Domain Controllers, check that it is able to correct time if the time on the virtual machine is ahead of real time!

Solution with w32time /NoSync and VMware Tools on VMware environments

  • Configure ALL VMware host’s to sync’ their time thru NTP, this is important, since we will use their time to set the time for all virtual machines. Don't forget to set the NTP client on the VMware host to start up automatically or with the host. Choose the “Configurations” tab, under software select “Time Configuration”, top right corner select Properties and fill out relevant information.

  • Configure virtual Domain Controllers, not to sync with the time service by using the No Sync parameter and let the service know that the server has an authoritative time.

  • Install and configure VMware tools and configure it to synchronize time with the ESX/ESXi hosts

This solution should be stable, even under light and extreme loads. The largest problem here is usually in keeping strict control over the VMware hosts and ensuring that any hosts the DC’s might run under (i.e. moved with VMotion) is always configured to use NTP to the same source. The most likely problem is that someone will add a new ESX/ESXi and forget to configure ntp servers.

Using w32time for NTP sync on virtual Domain Controllers

Since even 5 minutes drift can cause problems, and virtual machines as described before have a tendency to drift in time, it is not enough to synchronize the time on the virtual Domain Controllers every 8 hous (default after 3 sync’s). So we need to configure the servers to synchronize more often, but putting a high load on internet NTP servers is usually frowned upon by the internet community.

My recommended solution would be to use at least 2 physical servers locally that can run as local time servers. They in turn should synchronize their time from a trusted time source (physical or NTP from internet). Then have the virtual Domain Controller with the PDC FSMO role sync’ from those machines, and in turn other Domain Controllers from either the physical NTP servers or the PDC FSMO role holder as prefered. If You have a physical Domain Controller, use it as the PDC FSMO role holder, and configure it as the main time source for all other servers.

Make sure You increase the number of synchronizations being made on virtual Domain Controllers, I would suggest to something between 15 minutes to an hour. Also ensure the servers are configured to sync’ time, no matter how big a difference there is from the server and the time source.

Windows Time (w32time service) registry settings to configure

Some of the important registry settings to configure for Windows Time service (w32time). The parameters are set under this registry key.

  • HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesW32TimeParameters

Disk Cache on virtual servers running Active Directory

Domain Controllers will automatically disable disk caching to ensure that database integrity is not lost due to crashing or power failure, this is not limited to Domain Controllers, but all services using the Extensible Storage Engine databases, including WINS, DHCP and File Replication Service (FRS). Depending on the virtualization environment, this will be of no use, if the disk emulation ignores this or lets the operating system think the disk is writing directly but is actually caching.

If possible use a SCSI emulator that is compatible with forced unit access, so it tells the system when data is actually written. Ensure high availability and redundancy for storage systems and follow best practices and have uninterpretable power supply for both storage and virtual hosts.

Basically any database systems storage should always be secured to avoid database corruption, this includes both physical and virtual systems.

Why suspending or “pausing” a virtual Domain Controller is not wise

When You suspend an operating system and later on resume it, the machine will not be aware of what is happening. As far as it cares, the time lapsed did not exist. Any connections to the machine will be lost and reconnect usually automatically, the time will be changed – usually automatically.

If the Domain Controller has been offline for too long, it will have objects on it that were supposed to have been deleted by the tombstoning process. If this happens the Domain Controller will stop replication with it’s partners. You will see an event in the logs with ID 2042, Source NTDS Replication, Description: It has been too long since this machine last replicated with the named source machine. The time between replications with this source has exceeded the tombstone lifetime. Replication has been stopped with this source.

It is better to shutdown a Domain Controller properly, than leaving it suspended. Also even physical Domain Controllers, should not be left turned off for too long – so not much difference there. But try to avoid using suspending the Domain Controller, after all it was never designed to be aware of this and nasty things might happen.

Using Snapshots on a Domain Controller is worse than dropping Your iPhone in the toilet!

Really bad things happen, if You revert to an old snapshot of a Domain Controller. It is even worse than hot-cloning Your domain controller, in reality You are making almost 100% sure that You will break consistency in your Active Directory domain and loose data.

To explain it quick and simple. All Domain Controllers are aware of what replication has been done with other Domain Controllers, they even replicate this information by sharing USN values from other Domain Controllers. This helps the Domain Controllers to know what other Domain Controllers may need updated and who not to bother updating, thereby saving bandwidth and time.

Let’s look at an example of this going terribly wrong:

  • DC1 and DC2 are completely in sync and agree that they have synchronized to ”version A”, You take a snapshot of DC2.
  • A couple of users are added, some deleted, computer accounts change passwords, contact details on a user is changed, a couple of machines are added to the domain, and so on.
  • DC1 and DC2 sync’s again and are completely in sync, they both agree that they are now on “version B”.
  • DC2 is rolled back to the previous snapshot, and is now on “version A”.
  • When DC1 and DC2 next time talk together to try and sync, DC1 will not update DC2 with changes, because it “knows” that it allready has this information. (it does not matter that DC2 knows the info is missing, it will still not get it.).
  • Some more changes are made, lets say a couple of users are added on DC1.
  • DC1 and DC2 starts synchronizing, this time DC1 will give the new users to DC2, because this change is made AFTER “version B”, they will now both agree on being on “version C”.

Note that DC2 will never get the changes made between “version A” and “version B”, and it will never be the wiser – maybe not entirely true, it will figure it out some time and start writing event log entries about it.

This is also referred to as USN rollback or in this case a failed USN rollback, if done intentionally i.e. during a restore this can be a powerful tool to rollback changes made in a Active Directory Domain, but when done like this unintentionally, it can result in replication errors, lingering objects that should have been deleted, inconsistency between domain controllers, computers and accounts that can log on to certain DC’s but not others, one password on one DC, second password on another, and so on. All terrible results that should never be introduced to a production environment. Also detecting and recovering from these problems can be almost impossible.

If this happens to You, the best action to take is to immediately unpromote the affected Domain Controller, since this Domain Controller will never have correct information any longer and may also receive changes that it will not replicate to other Domain Controllers. Do NOT fix replication faults on a “sick” DC - You will only allow the “sick” DC to replicate with Your “healthy” DC’s!

If needed You can most likely promote this DC again, since it will do a full replication upon promotion, but if it is possible, why not setup a new clean DC? Also any changes made to the defective Domain Controller while it was up, will be lost – If it was running for a long time, and has important information, consider exporting information to the healthy Domain Controllers (time consuming).

Use System State for backups of Active Directory, NOT snapshots!

If You are considering using snapshots to restore an Active Directory server, reconsider. The above information clearly shows why. The only way to introduce a Domain Controller that is restored from a snapshot, is if it is the ONLY Domain Controller in existence. Instead use Microsoft built-in tools to do System State backup, and if needed backup the files. This will not only work , but will also be completely supported by Microsoft. If You have a full System State backup of an Active Directory domain, it does not matter what other problems You have or how much trouble You are in -Microsoft will always be able to restore information needed to setup the Domain again.

Also make sure You read up on “authoritative restores” for Active Directory, when restoring objects or full domains.

Performance is an issue, but is there anything special we should consider for virtual Domain Controllers?

Most Domain Controllers running infrastructure services like AD, DNS, DHCP, WINS. Running on physical machines of even small hardware will happily service loads of users and machines, with little utilization of the machines capabilities. Considering that we should ALWAYS have good redundancy on these services, it seems wasted to setup a lot of power draining physical machines to run these services, when we can setup multiple virtual machines to do the same and better utilize hardware capabilities.

However we should be aware, that putting Domain Controllers on overloaded virtual environments or with too little reserved resources, can result in bad performance, delays for clients accessing resources, logins, etc.

So consider how much resources You need carefully and where to place these services.

  • Global Catalog services have information about Your entire forest and can be heavily utilized by Exchange servers for information.
  • The PDC FSMO role will receive more requests, both from clients and other Domain Controllers in case of a mismatch and password changes.
  • Other FSMO roles in general receive little extra load and are usually used infrequently, high availability is however important in some situations.
  • DNS/WINS servers service clients with information from cache (read memory) and speed is important to achieve little lag on name look ups, ensure enough memory.
  • All Domain Controllers require readily available disk, cpu and memory resources, ensure enough is available for good performance and consider doing regular checks.
  • High availability of Active Directory (and DNS) services, is highest priority to have a functional infrastructure, without them nothing else may work.

Ensure that the virtual environment is able to load and start properly, when all virtual machines have been turned off. For example, if the virtual center service requires DNS to access databases and ESX/ESXi hosts, it would not work if all it’s DNS servers were virtual and offline!

Authored by: Guru Corner
Click Here to View all the questions in Domain Controller/Active Directory category.
File Attachments File Attachments
There are no attachment file(s) related to this question.
Article Information Additional Information
Article Number: 236
Created: 2012-08-04 4:43 PM
Rating: No Rating
Article Options Article Options
Print Question Print this Question
Export to Adobe PDF Export to PDF File
Export to MS Word Export to MS Word
Search Knowledge Base Search Knowledge Base

Powered by Guru Corner