So You went ahead and used virtualized Domain Controllers for Your
Active Directory domain, congratulations! I am sure You will be happy
with the decision, as long as You have a decent virtualizing
environment, this will give You both peace of mind, faster recovery and
There is however some special considerations You must do, when You
are using virtual Domain Controllers, not to mention, please with sugar
on top, do NOT P2V/Convert Your physical Domain Controllers to virtual.
What areas do we need to consider on a virtual DC?
- Time synchronization
- Disk cache
- Suspend/pausing virtual machine
- Snapshots and System State backups
Personally I much prefer virtual Domain Controllers, from having a
lot of physical ones, but there are some considerations to be made,
about perhaps leaving some physical and what features to use on the
virtual and what settings to use as well. This article attempts to
uncover some of the points to consider, specifically for virtual DC’s.
The list is in no way meant to be the only considerations, but is mostly
the things that I personally have noticed forgotten in environments I
have encountered. Add Your own preferences and research to this and You
should be well on Your way to live happily forever with Your virtual
Lets begin with Time Synchronization of Virtual Domain Controllers
Time in an Active Directory environment is paramount to all
authentication and secure communication, for both Domain Controllers,
servers and clients. In an Active Directory environment, kerberos is
used to issue a ticket during login, this ticket is default valid for 8
hours, and prevents constant authentication on Domain Controllers, every
time a user accesses resources. Instead the kerberos ticket is served
and verified thru out the forest. However, the encryption and security
between the client and the domain controller issuing the ticket,
requires an exchange of passwords and setup of a secure channel. To
prevent anyone from being able to listen on the network and reuse the
packets of authentication from the client from before, all packets
include a timestamp. If the timestamp coming from the client differs
with more than default 5 minutes from the Domain Controllers time, it
will discard the packet as fake.
The default maximum time difference allowed is only
5 minutes and is set in ”Maximum tolerance for computer clock synchronization” Group Policy setting for the domain.
Because of this time synchronization between windows clients in an
Active Directory environment is extremely important. A domain controller
and client with times that are not the same, can prevent logon and
access to network ressources.
All domain controllers, will by default have the time service
(w32time) running and it will function both as a client for the DC it
self and as a NTP server for domain servers and workstations to
synchronize with. In a domain, all DC’s will automatically synchronize
time with the Domain Controller that has the PDC FSMO role running. The
Domain Controller with the PDC role should then be manually configured
to sync it’s time with a good NTP source.
Why time synchronization fails when the Domain Controllers are virtual
Virtual machines, will by default have varying resources, cpu
clocks, etc. On a busy system, they may even be denied resources for
short periods of time or during high workloads, VMotion, Backups or may
receive higher cpu resources than the operating system is even aware it
could get. I.e. the operating system believes it has 1 cpu of 2.4 ghz,
in reality it is running on a VMware server with 8 cores of 2.4ghz.
This results in something usually referred to as time drifting, the
clock and the “ticks” it uses to keep time will sometimes run a faster
or slower. Personally I have seen virtual servers misconfigured for time
synchronization, that were off by several hours. Most time
synchronization clients will have a limit on how much the time may
differ from the NTP source and still synchronize. Some systems are set
for no more than 15 minutes, 1 hour, 15 hours or even synchronize no
matter the time difference, this may also prevent synchronizing because
the time has drifted too much since last sync.
The time service (w32time) running on Windows Servers and Domain
Controllers, will be well sufficient of keeping time very accurate on a
physical machine, with default sync’s being done every 45 minutes until
3 successful sync’s, then every 8 hours.
The time service on Domain Controllers, also functions as the time
server for all clients in the domain, so do not just disable this
service, if You do not need the client functionality!
So basically we need to ensure that our virtual Domain Controllers and
especially our DC with the PDC FSMO role are always synchronized
perfectly, otherwise we risk problems with authentication throughout the
VMware tools time synchronization – important!
Be aware, that a VMware timekeeping document, describes a serious
problem, that might make this solution very inappropriate.
However, at this writing, VMware Tools clock
synchronization has a serious limitation: it cannot correct the guest
clock if it gets ahead of real time.
This limitation applies only to periodic clock
synchronization. VMware Tools does a one-shot correction of the virtual
machine clock that may set it either backward or forward in two cases:
when the VMware Tools daemon starts (normally while the guest operating
system is booting), and when a user toggles the periodic clock
synchronization feature from off to on.
If the clock on the guest falls behind the clock on the
host, VMware Tools moves the clock on the guest forward to match the
clock on the host. If the clock on the guest is ahead of that on the
host, VMware Tools causes the clock on the guest to run more slowly
until the clocks are synchronized.
Basically this means, before using VMware tools for synchronizing our
Virtual Domain Controllers, check that it is able to correct time if the
time on the virtual machine is ahead of real time!
Solution with w32time /NoSync and VMware Tools on VMware environments
- Configure ALL VMware host’s to sync’ their time thru NTP, this is
important, since we will use their time to set the time for all virtual
machines. Don't forget to set the NTP client on the VMware host to start
up automatically or with the host. Choose the “Configurations” tab,
under software select “Time Configuration”, top right corner select
Properties and fill out relevant information.
- Configure virtual Domain Controllers, not to sync with the time
service by using the No Sync parameter and let the service know that the
server has an authoritative time.
- Install and configure VMware tools and configure it to synchronize time with the ESX/ESXi hosts
This solution should be stable, even under light and extreme loads.
The largest problem here is usually in keeping strict control over the
VMware hosts and ensuring that any hosts the DC’s might run under (i.e.
moved with VMotion) is always configured to use NTP to the same source.
The most likely problem is that someone will add a new ESX/ESXi and
forget to configure ntp servers.
Using w32time for NTP sync on virtual Domain Controllers
Since even 5 minutes drift can cause problems, and virtual machines
as described before have a tendency to drift in time, it is not enough
to synchronize the time on the virtual Domain Controllers every 8 hous
(default after 3 sync’s). So we need to configure the servers to
synchronize more often, but putting a high load on internet NTP servers
is usually frowned upon by the internet community.
My recommended solution would be to use at least 2 physical servers
locally that can run as local time servers. They in turn should
synchronize their time from a trusted time source (physical or NTP from
internet). Then have the virtual Domain Controller with the PDC FSMO
role sync’ from those machines, and in turn other Domain Controllers
from either the physical NTP servers or the PDC FSMO role holder as
prefered. If You have a physical Domain Controller, use it as the PDC
FSMO role holder, and configure it as the main time source for all other
Make sure You increase the number of synchronizations being made on
virtual Domain Controllers, I would suggest to something between 15
minutes to an hour. Also ensure the servers are configured to sync’
time, no matter how big a difference there is from the server and the
Windows Time (w32time service) registry settings to configure
Some of the important registry settings to configure for Windows Time
service (w32time). The parameters are set under this registry key.
Disk Cache on virtual servers running Active Directory
Domain Controllers will automatically disable disk caching to ensure
that database integrity is not lost due to crashing or power failure,
this is not limited to Domain Controllers, but all services using the
Extensible Storage Engine databases, including WINS, DHCP and File
Replication Service (FRS). Depending on the virtualization environment,
this will be of no use, if the disk emulation ignores this or lets the
operating system think the disk is writing directly but is actually
If possible use a SCSI emulator that is compatible with forced unit
access, so it tells the system when data is actually written. Ensure
high availability and redundancy for storage systems and follow best
practices and have uninterpretable power supply for both storage and
Basically any database systems storage should always be secured to
avoid database corruption, this includes both physical and virtual
Why suspending or “pausing” a virtual Domain Controller is not wise
When You suspend an operating system and later on resume it, the
machine will not be aware of what is happening. As far as it cares, the
time lapsed did not exist. Any connections to the machine will be lost
and reconnect usually automatically, the time will be changed – usually automatically.
If the Domain Controller has been offline for too long, it will have
objects on it that were supposed to have been deleted by the tombstoning
process. If this happens the Domain Controller will stop replication
with it’s partners. You will see an event in the logs with ID 2042,
Source NTDS Replication, Description: It has been too long since this
machine last replicated with the named source machine. The time between
replications with this source has exceeded the tombstone lifetime.
Replication has been stopped with this source.
It is better to shutdown a Domain Controller properly, than leaving
it suspended. Also even physical Domain Controllers, should not be left
turned off for too long – so not much difference there. But try to avoid
using suspending the Domain Controller, after all it was never designed
to be aware of this and nasty things might happen.
Using Snapshots on a Domain Controller is worse than dropping Your iPhone in the toilet!
Really bad things happen, if You revert to an old snapshot of a Domain
Controller. It is even worse than hot-cloning Your domain controller,
in reality You are making almost 100% sure that You will break
consistency in your Active Directory domain and loose data.
To explain it quick and simple. All Domain Controllers are aware of
what replication has been done with other Domain Controllers, they even
replicate this information by sharing USN values from other Domain
Controllers. This helps the Domain Controllers to know what other Domain
Controllers may need updated and who not to bother updating, thereby
saving bandwidth and time.
Let’s look at an example of this going terribly wrong:
- DC1 and DC2 are completely in sync and agree that they have synchronized to ”version A”, You take a snapshot of DC2.
- A couple of users are added, some deleted, computer accounts change
passwords, contact details on a user is changed, a couple of machines
are added to the domain, and so on.
- DC1 and DC2 sync’s again and are completely in sync, they both agree that they are now on “version B”.
- DC2 is rolled back to the previous snapshot, and is now on “version A”.
- When DC1 and DC2 next time talk together to try and sync, DC1 will
not update DC2 with changes, because it “knows” that it allready has
this information. (it does not matter that DC2 knows the info is
missing, it will still not get it.).
- Some more changes are made, lets say a couple of users are added on DC1.
- DC1 and DC2 starts synchronizing, this time DC1 will give the new
users to DC2, because this change is made AFTER “version B”, they will
now both agree on being on “version C”.
Note that DC2 will never get the changes made between “version A” and
“version B”, and it will never be the wiser – maybe not entirely
true, it will figure it out some time and start writing event log
entries about it.
This is also referred to as USN rollback or in this case a failed USN
rollback, if done intentionally i.e. during a restore this can be a powerful tool to rollback changes made in a Active Directory Domain,
but when done like this unintentionally, it can result in replication
errors, lingering objects that should have been deleted, inconsistency
between domain controllers, computers and accounts that can log on to
certain DC’s but not others, one password on one DC, second password on
another, and so on. All terrible results that should never be introduced
to a production environment. Also detecting and recovering from these
problems can be almost impossible.
If this happens to You, the best action to take is to immediately
unpromote the affected Domain Controller, since this Domain Controller
will never have correct information any longer and may also receive
changes that it will not replicate to other Domain Controllers. Do NOT
fix replication faults on a “sick” DC - You will only allow the “sick”
DC to replicate with Your “healthy” DC’s!
If needed You can most likely promote this DC again, since it will do
a full replication upon promotion, but if it is possible, why not setup
a new clean DC? Also any changes made to the defective Domain
Controller while it was up, will be lost – If it was running for a long
time, and has important information, consider exporting information to
the healthy Domain Controllers (time consuming).
Use System State for backups of Active Directory, NOT snapshots!
If You are considering using snapshots to restore an Active Directory
server, reconsider. The above information clearly shows why. The only
way to introduce a Domain Controller that is restored from a snapshot,
is if it is the ONLY Domain Controller in existence. Instead use
Microsoft built-in tools to do System State backup, and if needed backup
the files. This will not only work , but will also be completely
supported by Microsoft. If You have a full System State backup of an
Active Directory domain, it does not matter what other problems You have
or how much trouble You are in -Microsoft will always be able to
restore information needed to setup the Domain again.
Also make sure You read up on “authoritative restores” for Active Directory, when restoring objects or full domains.
Performance is an issue, but is there anything special we should consider for virtual Domain Controllers?
Most Domain Controllers running infrastructure services like AD, DNS,
DHCP, WINS. Running on physical machines of even small hardware will
happily service loads of users and machines, with little utilization of
the machines capabilities. Considering that we should ALWAYS have good
redundancy on these services, it seems wasted to setup a lot of power
draining physical machines to run these services, when we can setup
multiple virtual machines to do the same and better utilize hardware
However we should be aware, that putting Domain Controllers on
overloaded virtual environments or with too little reserved resources,
can result in bad performance, delays for clients accessing resources,
So consider how much resources You need carefully and where to place these services.
- Global Catalog services have information about Your entire forest
and can be heavily utilized by Exchange servers for information.
- The PDC FSMO role will receive more requests, both from clients and
other Domain Controllers in case of a mismatch and password changes.
- Other FSMO roles in general receive little extra load and are
usually used infrequently, high availability is however important in
- DNS/WINS servers service clients with information from cache (read
memory) and speed is important to achieve little lag on name look ups,
ensure enough memory.
- All Domain Controllers require readily available disk, cpu and
memory resources, ensure enough is available for good performance and
consider doing regular checks.
- High availability of Active Directory (and DNS) services, is highest
priority to have a functional infrastructure, without them nothing else
Ensure that the virtual environment is able to load and start
properly, when all virtual machines have been turned off. For example,
if the virtual center service requires DNS to access databases and
ESX/ESXi hosts, it would not work if all it’s DNS servers were virtual