www.deltaboy.dk a website by Torben J.
  
Blue screen
Crash Dump Analysis
Random crash
Data loos
 
   
   
  Home    Logbook    Photos    Links    Tools    Scripts 
Analyzing blue screens can save you repeated crashes or hours of reinstallation time

Windows 2000 indisputably brings a previously unknown level of reliability to Windows. Microsoft's rewrite of the core OS code to handle unusual situations, the company's enormous testing effort, and the new Driver Verifier tool mean that blue screens on Win2K systems are rare. However, many corporations still rely heavily on Windows NT 4.0. And although device drivers that ship with Win2K undergo comprehensive stress and correctness validation before receiving the stamp of approval from Microsoft's Windows Hardware Quality Labs (WHQL), undetected bugs can still surface. Further, if you install applications that contain nonhardware drivers, such as virus scanners, quota-management utilities, or encryption packages, your Win2K system might have drivers that haven't been through WHQL testing, even if you set the system's driver-signing policy to otherwise prevent untested drivers. Thus, although blue screens will be fewer, you might still see one from time to time, and having the information necessary to analyze them can mean the difference between spending a few minutes to uninstall one application and spending a few hours to perform a full OS reinstall.

Many systems administrators forgo exploring Win2K's and NT 4.0's crash dump options in the belief that using them is too difficult. Although Microsoft's debugger documentation has improved in the past year, it's still oriented toward device-driver developers. But even if just one crash dump in five contains information that proves useful, you'll find it worthwhile to learn at least a little about crash dump analysis.

This primer on crash dump analysis will ease the learning curve. I start with the basics of configuring a system to save a memory dump when the system crashes, describe where you can find the tools you need to examine a crash dump, then give you tips on gleaning information from a dump. Along the way, I introduce you to a continually evolving automated dump analysis tool, the Kernel Memory Space Analyzer (Kanalyze).

Enabling Crash Dumps
The first step in crash dump analysis is ensuring that when a system crashes, it produces a memory dump. You access the NT 4.0 crash dump options through the Control Panel System applet's Startup/Shutdown tab. Figure 1 shows the Startup/Shutdown page, in which you select the Write debugging information to check box and enter the name of the file you want to write the dump to. Other options on the page direct the system's behavior in response to a crash and include writing an event to the System log, sending an administrative alert, and automatically rebooting.

Because NT 4.0 crash dump files include a copy of the contents of a computer's physical memory, you need to ensure that your system has adequate disk space to save and store a dump. First, configure a paging file on the boot volume (the volume that contains the \winnt directory). The paging file needs to be large enough to store the system's memory plus 1MB. The volume that stores the dump file (which by default is also the boot volume) must have slightly more free space than the computer has physical memory.

These requirements derive from the way the kernel implements its crash dump facility. During the boot process, the OS checks the registry crash dump options in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl subkey. If one or more options are enabled, the system generates a map of the disk blocks that the boot volume's paging file occupies and saves the map in memory. The system also determines which disk device driver manages the boot volume and calculates a checksum of the driver's in-memory image and the data structures that must be intact for the driver to perform disk I/O. When a crash occurs, the kernel verifies the integrity of the paging file map, the disk driver, and the disk-driver control structures. If these structures are intact, the kernel invokes special disk-driver I/O functions that exist specifically for dumping memory when the system crashes. These I/O functions are self-contained and don't rely on any kernel services, because crash dump-related code must make no assumptions about which parts of the kernel or device drivers the situation that led to the crash might have compromised. The kernel writes the contents of memory to the paging file's sector map so that the kernel can avoid relying on file-system drivers. The kernel verifies the integrity of every component involved in the dump process before proceeding because writing directly to sectors on the disk could shred a disk's data if those sectors lie outside the paging file. A paging file must be 1MB larger than physical memory because when the kernel writes the dump, the kernel also writes a header that contains a crash dump signature and the values of several key kernel variables. Although the header is much smaller than 1MB, the system sizes a paging file by megabytes.

When a system boots, the Session Manager process (\winnt\system32\smss.exe) initializes the system's paging files by using the native NtCreatePagingFile function to create each file. NtCreatePagingFile determines whether the paging file it's initializing exists, and if so, whether the file has a dump header. When a dump header is present, NtCreatePagingFile returns a special code to Session Manager. As a result, when Session Manager executes the logon manager(\winnt\system32\winlogon.exe) to start the Winlogon process, Session Manager notifies Winlogon that a crash dump exists. Winlogon then executes the SaveDump application (\winnt\system32\savedump.exe), which examines the dump header to decide what crash response actions to perform. If the header indicates that a memory dump is present, SaveDump copies the contents of the paging file to the crash dump file you specified in the Startup/Shutdown dialog. While SaveDump writes the dump file, the system doesn't use the part of the paging file that contains the crash dump. During that time, the amount of virtual memory available for the system and applications reduces by the size of the dump, and dialog-box pop-ups might indicate that the system is low on virtual memory. After SaveDump runs, it informs the memory manager that it has finished saving the dump, and the memory manager makes available for general use the portion of the paging file that contains the dump. After saving a dump file, SaveDump performs other specified crash options, such as sending an administrative alert or writing an event to the System log.

The copy of the system's memory contents at the time of a crash often contains information that isn't useful for analyzing a crash dump. Because a crash results from a problem during kernel-mode execution, user-mode application data isn't generally relevant to crash diagnosis. Kernel-mode memory includes all OS and driver data structures, as well as executable code for device drivers and the kernel, so Win2K introduces a crash dump option that has the system save only kernel-mode memory. This option can significantly reduce the size of a crash dump file, making the file quicker to generate and copy and more practical to store and exchange with support personnel. A typical system with 128MB of memory might have only a 40MB kernel-memory dump. Figure 2 shows the Win2K Startup and Recovery crash-option dialog box, which you access by clicking Startup and Recovery on the System applet's Advanced tab.

Win2K also includes a minidump option. Minidumps, which the Startup and Recovery dialog box's Write Debugging Information drop-down list calls Small Memory Dumps, are 64KB crash dumps that store a minimal set of potentially useful information, such as the blue screen crash code, a list of loaded drivers, information about the process and thread being executed at crash time, and a snapshot of the crash point's stack (i.e., a history of recently called functions). The minidump data, which is essentially the same information that NT 4.0 displays on blue screens, sometimes contains sufficient information to guess at the cause of a crash. Minidumps are small and don't overwrite previous minidumps. A minidump's name has the form minimmddyy-nn.dmp, where mm, dd, and yy represent the month, day, and year, respectively, and nn is a unique number that distinguishes minidumps generated on the same day. By default, Win2K saves minidump files in the \%systemroot%\minidump directory. You analyze minidumps the same way you analyze full and kernel-only dumps. However, I recommend enabling kernel-memory dumps if you have the necessary disk space.

Reasons Crash Dumps Fail
Systems might fail to save a crash dump for a number of reasons. A system won't save a dump if the paging file on your boot volume is too small or if the volume on which you want to save the dump file doesn't have enough free space. In the latter case, you'll find a SaveDump record in the System log indicating that a dump wasn't saved.

More obscure reasons why a system might not save a dump include the possibility that a misbehaving driver corrupted the structures or code involved in saving the dump. In such cases, either the code fails to execute altogether or checksums of the disk device driver components identify changes and the kernel avoids possible disk corruption by not writing the dump. In addition, incompletely written disk drivers—which aren't uncommon on NT 4.0 systems—don't implement the special dump I/O routines that the dump code requires. (For more information, see "Related Reading," page 70.) All drivers that Microsoft digitally signs include crash dump support, so this problem won't occur on Win2K systems that have only signed drivers.

To test a system's ability to generate a crash dump, download the BSOD program from http://www.sysinternals.com/bluesave.htm and run it after waiting until your system appears idle for at least a minute. After you confirm that you want to crash your system, BSOD installs a device driver that allocates some kernel memory, frees it, then references the freed memory at a high interrupt request level (IRQL). Referencing freed memory and referencing memory at a high IRQL are illegal operations, so BSOD virtually guarantees a crash.

Analysis Tools
After you've configured your system to generate crash dumps and verified that it can do so successfully, you need to obtain crash dump analysis tools and associated data files. Most important, you must have available the symbol files for at least the kernel's ntoskrnl.exe file. Symbol files identify the names of internal functions and variables in the module to which they correspond, which can provide helpful information during crash dump analysis. If possible, you should obtain and install all the symbol files. Symbol files are service pack-specific, so make sure that the symbols you install are for your service-pack level.

You can find symbol files for the English version of NT 4.0 in the \bussys\winnt\winnt-public\fixes\usa\nt40 directory of Microsoft's anonymous ftp server at ftp://ftp.microsoft.com. (Symbols for other languages are in appropriate subdirectories under \bussys\winnt\winntpublic\fixes.) Symbols for the initial release of Win2K are on the Win2K Customer Support Diagnostics CD-ROM. When you insert this CD-ROM into the drive, a Web page opens and links to the symbol-file extraction tool. You can download Win2K Service Pack 1 (SP1) symbols from http://www.microsoft.com/ windows2000/downloads/recommended/sp1/debug/default.asp. The standard symbol installation directory is \winnt\symbols, but you can install symbols anywhere you want. To save your work later when you run analysis tools, define the environment variable _NT_SYMBOL_PATH to point to the top-level directory of your symbol installation (e.g., if you installed to \winnt\symbols, set the path to \winnt\symbols).

Next, you need to install the crash-analysis tools. Although you can find these debugging tools on the NT 4.0 Setup CD-ROM and the Win2K Customer Support Diagnostics CD-ROM, you should download the version posted at http://www.microsoft.com/ddk/debugging/installx86.htm because it reflects recent enhancements and bug fixes. I recommend you install the tools to a directory, such as C:\debuggers, that you can easily access from a command prompt.

Also download the OEM Support Tools from the Microsoft article "OEM Support Tools Phase 3 Service Release 2 Availability" (http://support.microsoft.com/support/kb/articles/q253/0/66.asp). These tools include useful add-ons to the basic debugging tools. The download is a Zip file, and I recommend that you unzip the tools to a different directory from the one you use for the other debugging tools. To read the documentation available for the OEM Support Tools, load the Install directory's starthere.htm file in a Web browser. Periodically check the OEM Support Tools and the debugging tools pages for updates.


Bookmark this page Danmarks Meteorologiske institut Dansk Drageflyver Union Check out MySpace Looking for somthing?
virtual="