Analyzing blue screens can
save you repeated crashes or hours of reinstallation time
Windows 2000 indisputably brings a previously unknown level of
reliability to Windows. Microsoft's rewrite of the core OS code
to handle unusual situations, the company's enormous testing
effort, and the new Driver Verifier tool mean that blue screens
on Win2K systems are rare. However, many corporations still rely
heavily on Windows NT 4.0. And although device drivers that ship
with Win2K undergo comprehensive stress and correctness
validation before receiving the stamp of approval from
Microsoft's Windows Hardware Quality Labs (WHQL), undetected
bugs can still surface. Further, if you install applications
that contain nonhardware drivers, such as
virus scanners, quota-management utilities, or encryption
packages, your Win2K system might have drivers that haven't been
through WHQL testing, even if you set the system's
driver-signing policy to otherwise prevent untested drivers.
Thus, although blue screens will be fewer, you might still see
one from time to time, and having the information necessary to
analyze them can mean the difference between spending a few
minutes to uninstall one application and spending a few hours to
perform a full OS reinstall.
Many systems administrators forgo exploring Win2K's and NT
4.0's crash dump options in the belief that using them is too
difficult. Although Microsoft's debugger documentation has
improved in the past year, it's still oriented toward
device-driver developers. But even if just one crash dump in
five contains information that proves useful, you'll find it
worthwhile to learn at least a little about crash dump analysis.
This primer on crash dump analysis will ease the learning
curve. I start with the basics of configuring a system to save a
memory dump when the system crashes, describe where you can find
the tools you need to examine a crash dump, then give you tips
on gleaning information from a dump. Along the way, I introduce
you to a continually evolving automated dump analysis tool, the
Kernel Memory Space Analyzer (Kanalyze).
Enabling Crash Dumps
The first step in crash dump analysis is ensuring that when a
system crashes, it produces a memory dump. You access the NT 4.0
crash dump options through the Control Panel System applet's
Startup/Shutdown tab.
Figure 1 shows the Startup/Shutdown page, in which you
select the Write debugging information to check box and
enter the name of the file you want to write the dump to. Other
options on the page direct the system's behavior in response to
a crash and include writing an event to the System log, sending
an administrative alert, and automatically rebooting.
Because NT 4.0 crash dump files include a copy of the
contents of a computer's physical memory, you need to ensure
that your system has adequate
disk space to save and store a dump. First, configure a
paging file on the boot volume (the volume that contains the \winnt
directory). The paging file needs to be large enough to store
the system's memory plus 1MB. The volume that stores the dump
file (which by default is also the boot volume) must have
slightly more free space than the computer has physical memory.
These requirements derive from the way the kernel implements
its crash dump facility. During the boot process, the OS checks
the
registry crash dump options in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
subkey. If one or more options are enabled, the system generates
a map of the disk blocks that the boot volume's paging file
occupies and saves the map in memory. The system also determines
which disk device driver manages the boot volume and calculates
a checksum of the driver's in-memory image and the data
structures that must be intact for the driver to perform disk
I/O. When a crash occurs, the kernel verifies the integrity of
the paging file map, the disk driver, and the disk-driver
control structures. If these structures are intact, the kernel
invokes special disk-driver I/O functions that exist
specifically for dumping memory when the system crashes. These
I/O functions are self-contained and don't rely on any kernel
services, because crash dump-related code must make no
assumptions about which parts of the kernel or device drivers
the situation that led to the crash might have compromised. The
kernel writes the contents of memory to the paging file's sector
map so that the kernel can avoid relying on file-system drivers.
The kernel verifies the integrity of every component involved in
the dump process before proceeding because writing directly to
sectors on the disk could shred a disk's data if those sectors
lie outside the paging file. A paging file must be 1MB larger
than physical memory because when the kernel writes the dump,
the kernel also writes a header that contains a crash dump
signature and the values of several key kernel variables.
Although the header is much smaller than 1MB, the system sizes a
paging file by megabytes.
When a system boots, the Session Manager process (\winnt\system32\smss.exe)
initializes the system's paging files by using the native
NtCreatePagingFile function to create each file.
NtCreatePagingFile determines whether the paging file it's
initializing exists, and if so, whether the file has a dump
header. When a dump header is present, NtCreatePagingFile
returns a special code to Session Manager. As a result, when
Session Manager executes the logon manager(\winnt\system32\winlogon.exe)
to start the Winlogon process, Session Manager notifies Winlogon
that a crash dump exists. Winlogon then executes the SaveDump
application (\winnt\system32\savedump.exe), which examines the
dump header to decide what crash response actions to perform. If
the header indicates that a memory dump is present, SaveDump
copies the contents of the paging file to the crash dump file
you specified in the Startup/Shutdown dialog. While SaveDump
writes the dump file, the system doesn't use the part of the
paging file that contains the crash dump. During that time, the
amount of virtual memory available for the system and
applications reduces by the size of the dump, and dialog-box
pop-ups might indicate that the system is low on virtual memory.
After SaveDump runs, it informs the memory manager that it has
finished saving the dump, and the memory manager makes available
for general use the portion of the paging file that contains the
dump. After saving a dump file, SaveDump performs other
specified crash options, such as sending an administrative alert
or writing an event to the System log.
The copy of the system's memory contents at the time of a
crash often contains information that isn't useful for analyzing
a crash dump. Because a crash results from a problem during
kernel-mode execution, user-mode application data isn't
generally relevant to crash diagnosis. Kernel-mode memory
includes all OS and driver data structures, as well as
executable code for device drivers and the kernel, so Win2K
introduces a crash dump option that has the system save only
kernel-mode memory. This option can significantly reduce the
size of a crash dump file, making the file quicker to generate
and copy and more practical to store and
exchange with support personnel. A typical system with 128MB
of memory might have only a 40MB kernel-memory dump.
Figure 2 shows the Win2K Startup and
Recovery crash-option dialog box, which you access by
clicking Startup and Recovery on the System applet's
Advanced tab.
Win2K also includes a minidump option. Minidumps, which the
Startup and Recovery dialog box's Write Debugging
Information drop-down list calls Small Memory Dumps, are 64KB
crash dumps that store a minimal set of potentially useful
information, such as the blue screen crash code, a list of
loaded drivers, information about the process and thread being
executed at crash time, and a snapshot of the crash point's
stack (i.e., a history of recently called functions). The
minidump data, which is essentially the same information that NT
4.0 displays on blue screens, sometimes contains sufficient
information to guess at the cause of a crash. Minidumps are
small and don't overwrite previous minidumps. A minidump's name
has the form minimmddyy-nn.dmp, where mm, dd, and
yy represent the month, day, and year, respectively, and
nn is a unique number that distinguishes minidumps
generated on the same day. By default, Win2K saves minidump
files in the \%systemroot%\minidump directory. You analyze
minidumps the same way you analyze full and kernel-only dumps.
However, I recommend enabling kernel-memory dumps if you have
the necessary disk space.
Reasons Crash Dumps Fail
Systems might fail to save a crash dump for a number of reasons.
A system won't save a dump if the paging file on your boot
volume is too small or if the volume on which you want to save
the dump file doesn't have enough free space. In the latter
case, you'll find a SaveDump record in the System log indicating
that a dump wasn't saved.
More obscure reasons why a system might not save a dump
include the possibility that a misbehaving driver corrupted the
structures or code involved in saving the dump. In such cases,
either the code fails to execute altogether or checksums of the
disk device driver components identify changes and the kernel
avoids possible disk corruption by not writing the dump. In
addition, incompletely written disk drivers—which aren't
uncommon on NT 4.0 systems—don't implement the special dump I/O
routines that the dump code requires. (For more information, see
"Related Reading," page 70.) All drivers that Microsoft
digitally signs include crash dump support, so this problem
won't occur on Win2K systems that have only signed drivers.
To test a system's ability to generate a crash dump, download
the BSOD program from http://www.sysinternals.com/bluesave.htm
and run it after waiting until your system appears idle for at
least a minute. After you confirm that you want to crash your
system, BSOD installs a device driver that allocates some kernel
memory, frees it, then references the freed memory at a high
interrupt request level (IRQL). Referencing freed memory and
referencing memory at a high IRQL are illegal operations, so
BSOD virtually guarantees a crash.
Analysis Tools
After you've configured your system to generate crash dumps and
verified that it can do so successfully, you need to obtain
crash dump analysis tools and associated data files. Most
important, you must have available the symbol files for at least
the kernel's ntoskrnl.exe file. Symbol files identify the names
of internal functions and variables in the module to which they
correspond, which can provide helpful information during crash
dump analysis. If possible, you should obtain and install all
the symbol files. Symbol files are service pack-specific, so
make sure that the symbols you install are for your service-pack
level.
You can find symbol files for the English version of NT 4.0
in the \bussys\winnt\winnt-public\fixes\usa\nt40 directory of
Microsoft's anonymous
ftp server at ftp://ftp.microsoft.com. (Symbols for other
languages are in appropriate subdirectories under \bussys\winnt\winntpublic\fixes.)
Symbols for the initial release of Win2K are on the Win2K
Customer Support Diagnostics CD-ROM. When you insert this CD-ROM
into the drive, a Web page opens and links to the symbol-file
extraction tool. You can download Win2K Service Pack 1 (SP1)
symbols from http://www.microsoft.com/ windows2000/downloads/recommended/sp1/debug/default.asp.
The standard symbol installation directory is \winnt\symbols,
but you can install symbols anywhere you want. To save your work
later when you run analysis tools, define the environment
variable _NT_SYMBOL_PATH to point to the top-level directory of
your symbol installation (e.g., if you installed to \winnt\symbols,
set the path to \winnt\symbols).
Next, you need to install the crash-analysis tools. Although
you can find these debugging tools on the NT 4.0 Setup CD-ROM
and the Win2K Customer Support Diagnostics CD-ROM, you should
download the version posted at http://www.microsoft.com/ddk/debugging/installx86.htm
because it reflects recent enhancements and bug fixes. I
recommend you install the tools to a directory, such as C:\debuggers,
that you can easily access from a command prompt.
Also download the OEM Support Tools from the Microsoft
article "OEM Support Tools Phase 3 Service Release 2
Availability" (http://support.microsoft.com/support/kb/articles/q253/0/66.asp).
These tools include useful add-ons to the basic debugging tools.
The download is a Zip file, and I recommend that you unzip the
tools to a different directory from the one you use for the
other debugging tools. To read the documentation available for
the OEM Support Tools, load the Install directory's
starthere.htm file in a Web browser. Periodically check the OEM
Support Tools and the debugging tools pages for updates. |