[an error occurred while processing this directive]

 


Exploring 2000 -
By design (Oct 1999)
Bob Walder investigates Windows 2000’s scalability, manageability and reliability.
.

house
[an error occurred while processing this directive]

This article is based on the latest builds of Windows 2000 Server & Professional RC 1.

In creating Windows 2000, the development team had a number of key architectural challenges to address:

  • Improved scalability – with a requirement to exploit microprocessor advances and Symmetrical Multi Processing technology (SMP), exploit large amounts of memory, and exploit high-performance I/O
  • Higher availability – with a requirement to allow a higher proportion of routine maintenance tasks to be completed without down time, provision of network failure detection and automatic recovery, and improved system monitoring and management
  • Greater reliability – with a requirement once again for less configuration and maintenance down time, a reduction in the number of system crashes or ‘Blue Screen Of Death’ (BSOD) events, and improvements in device drivers and tools
  • Improved storage management – providing easier management of large amounts of storage, a reduction in down time when managing storage, support for alternative cost-effective media, and file system indexing

Scalability enhancements

Starting with scalability enhancements, it is apparent that advances in processor architecture and high-end server specifications provide a faster and more flexible platform on which to run. However, today’s operating systems will also have more and more demands placed upon them as a result of increases in the number of processors and amount of memory that these machines can support. New generations of I/O architecture will also place additional demands on the OS.

Windows 2000 has been designed from the ground up with these advances in mind, providing support for higher processor counts and being optimised for SMP. It now includes such features as a per-processor look-aside cache, improved memory allocation efficiency (providing a 5 per cent improvement in disk I/O), reduced resource contention, per-process completion ports (providing a 7 per cent increase in TPC-C throughput), a per processor thread pool, and the use of fibers. The latter are lightweight versions of threads, with a reduced memory cost that will ease porting of software from Unix environments.

CPU and SMP optimisation is also improved by the use of Job Objects that manage groups of processes as a unit, keeping them separate from other groups. This provides finer control over processes running on the system and limits possible adverse effects. The file system cache has been increased from 512MB to 960MB.

Enterprise Memory Architecture


The new Enterprise Memory Architecture (EMA) exploits larger physical memory space by increasing the maximum addressable memory from 4GB to 64GB on Intel and 32MB on Alpha platforms. Actually, on Intel systems the limit is 32MB under Windows 2000 Advanced Server, and 64MB under Datacentre Server. EMA will work on any Alpha platform, where the OS will manage the memory enabling more than one application to use the address space. Although maximum memory size is greater on Intel systems, it is supported by Xeon processors only, and the address space is managed by the application, allowing only one application to use it at a time.

Because we are still effectively working with a 32 bit address space, the additional memory is divided into multiple 4GB pages handled via the new Page Size Extension feature on the Xeon processors. This allows Windows 2000 to easily intermix 32 bit and 36 bit addresses, and reduces the effort needed to develop and support changes in the virtual memory subsystem. Managing memory above 4GB with large pages is more efficient, providing better performance, but many will not implement large memory servers until 64 bit Windows – with its more efficient linear address space – is finally available. It will be interesting to see whether Windows 2000 Datacentre Server will ever see the light of day, given the fact that by the time it appears, 64 bit Windows will be on the horizon. Final tweaks to scalability are in the area of enhanced I/O efficiency. Here, code path lengths have been reduced to improve the I/O drivers, handle improvements have been made to allow more handles to be used with reduced contention, context switching has been reduced in the NTFS file system, and there has been a reduction in contention on spin locks on SCSI devices.

Availability


Moving on from scalability to availability, Microsoft’s big offering in this area (when it finally appears) is clustering, which it intends to deliver in two phases. The current phase allows workload on two servers to automatically fail over (that is, automatically transfer from the primary to the secondary server in case of system failure) to the second server, thus creating the beginnings of a high availability NT environment. The second phase, to be delivered with Windows 2000, will extend high availability clustering by adding support for large, multi-server clusters that share resources and behave like a single, logical ‘super server’.Clients see a cluster as if it were a single, high-performance, highly reliable server, and services are cluster-wide with the ability to tolerate component failures. So, should any one server fail, its services are automatically handled by other members of the cluster. Components can be added to the cluster transparently to users (i.e. new servers or storage devices), and existing client connectivity is not affected by clustered applications. This provides the means to offer rolling upgrades whilst maintaining continuous service for the user, once again enhancing the availability of Windows 2000 systems.

The third item in our list of challenges is related closely to the idea of high availability – greater reliability. What everyone wants from Windows 2000 is fewer system crashes – the notorious BSOD syndrome – and fewer forced reboots following minor reconfigurations. Hardware and software configuration and maintenance should be a whole lot friendlier under Windows 2000 where, it is said, reboots will be required for only a few major changes. This is a laudable aim, but it has to be said that I must have hit all of them almost immediately in my testing of Beta 3 and Release Candidate 1, which does not bode well. In other words, there are still far too many occasions where a reboot is necessary when it probably should not be. Hopefully, things will improve further between now and release time.

On the positive side, a reinstall is no longer required when upgrading a server to be a Domain Controller, which is a huge step forward (though why it should ever have been necessary in the first place is anyone’s guess – probably a hangover form the bad old LAN Manager days). The number of forced reboots has been reduced by about 50 in areas such as volume management, configuring network protocols (this is the area where I still have the occasional problem), settings on PCI and other PnP hardware, and so on. In theory, the only reboots that will be required going forward are for major events such as machine name and domain changes, font changes (why?), and Service Pack installs.

Blue Screen of Death


In the past we have all become far more familiar than we would like to be with the infamous Blue Screen Of Death. The two biggest causes of this ailment are poor driver code and resource and memory leaks, the eventual BSOD resulting from a serious error detected by kernel mode code which finds it can do nothing to rectify the problem. Memory leaks cause vital resources to drain away slowly until performance slows to a crawl or the system hangs completely. This forces many administrators to perform regular ‘preventative reboots’ of the system to restore the missing memory. OK, so its not a system crash, but it still results in down time and an interruption to the service to the end user. Memory leaks have probably done more to earn NT the ‘unreliable’ label that it seems to carry - small wonder, then, that attempting to eradicate the memory leak has been a top priority for this new release.

Most current memory leak problems are almost impossible to identify and even harder to cure once they have a grip on the system, so new tools will be provided to help identify and fix leaks as they occur. Prevention is better than cure, of course, and the new job object allows the imposition of memory limits on a collection of processes. Work is in hand to improve the problem of bad drivers too, with improved DDK driver samples and documentation, enhanced driver testing, driver signing, and the adoption of the new Windows Driver Model (WDM). Microsoft will also carry out regular testing of major third party anti-virus software, another regular cause of NT problems.

Of course, no one is trying to pretend that there will never be another BSOD. If and when it does occur (hopefully far less frequently than under NT 4 and previous versions), crash dumps have been made very much quicker than at present and comprehensive crash dump analysis tools are being developed to help identify the cause. A Web-based ‘trouble-shooter’ will be available for most of the common blue screens, and application recovery techniques have been improved too.

Hopefully this will have given some insight as to the effort that has gone into making Windows 2000 more scalable, manageable and, above all, reliable. No one is pretending that Windows 2000 will not be without its problems. With almost a complete redesign and rewrite under the hood, and huge new additions such as Active Directory, it will be inevitable that we see the odd few performance problems, reliability issues, and even the occasional BSOD in the early days following its release. However, the Windows 2000 architecture appears to offer a sound platform for the future, with the promise of a faster and more reliable OS somewhere down the road.

.

[an error occurred while processing this directive]