Generally, a bottleneck is any condition
that keeps a computer from performing at its best. Bottlenecks can
also apply to situations in which one resource is preventing another
resource from performing optimally. For example, if a system doesn’t
have enough physical memory, it doesn’t matter whether it has a fast
processor or a slow processor. The system will still perform poorly
because it doesn’t have enough physical memory available and must rely heavily on the paging file, reading and writing to disk
frequently.
Memory is usually the main bottleneck on both workstations and
servers. It is the resource you should examine first to try to
determine why a system isn’t performing as expected. But memory isn’t
the only bottleneck. The processor, disk subsystem, and networking
components are also sources of potential performance
bottlenecks.
Resolving memory bottlenecks
Windows applications use a lot of memory. If you install a
server with the minimum amount of memory required, it isn’t going to
perform at its optimal level. The reason for this is that a server’s
memory requirements depend on many factors, including
the services, components, and applications that are installed on the
server, as well as the server’s configuration.
Computers use both physical and virtual memory. Physical memory is represented by the
amount of random access memory (RAM) installed. Virtual memory is
memory written to a paging file on disk. Reading from and writing to
the paging file involve the disk subsystem, and it is much slower
than accessing physical memory. Because of this, you don’t want a
system to have to use the paging file too frequently.
Before you set out to monitor memory usage, you should check
to ensure that the computer has the recommended amount of memory for
the operating system and the applications it is running. After
you’ve done this, you can determine how the system is using memory
and check for problems. Look closely at the amount of memory
available and the amount of virtual memory being used. If the server
has very little available memory, you might need to add memory to
the system. In general, you want the available memory to be no less
than 5 percent of the total physical memory on the server. If the
server has a high ratio of virtual memory being used to total
physical memory on the system, you might need to add physical memory
as well.
Look at the way the system is using the paged pool and
nonpaged pool memory. The paged pool is an area of system memory for
objects that can be written to disk when they aren’t used. The
nonpaged pool is an area of system memory for objects that can’t be
written to disk. If the size of the paged pool is large relative to
the total amount of physical memory on the system, you might need to
add memory to the system. If the size of the nonpaged pool is large
relative to the total amount of virtual memory allocated to the server, you might want to
increase the virtual memory size.
Look at the way the system is using the paging file. A page fault occurs when a process
requests a page in memory and the system can’t find it at the
requested location. If the requested page is elsewhere in memory,
the fault is called a soft page fault. If the
requested page must be retrieved from the paging file on disk, the
fault is called a hard page fault. Most
processors can handle large numbers of soft faults. Hard faults,
however, can cause significant delays. If there are a high number of
hard page faults, you might need to increase the
amount of memory or reduce the size of the system cache.
Counters you can use to check for memory bottlenecks include the following:
-
Memory\Available
Bytes Records the number of bytes of physical memory available to processes
running on the server. When there is less than 5 percent of
memory free, the system is low on memory and performance can suffer. The server might page
excessively to disk to try to keep up with resource demands.
Memory is critically short if there is 128 megabytes (MBs) or
less of memory free; in this case, the system might page
excessively to disk and try to borrow memory from running
processes to keep up with resource demands. If the system is
very low on memory, it could also point to a possible memory
leak.
-
Memory\Committed
Bytes Records the number of bytes of committed virtual
memory. This represents memory that has been paged to disk and
is in use. If a server is using too much virtual memory relative
to the total physical memory on the system, you might need to
add physical memory.
-
Memory\Commit
Limit Shows the total physical and virtual memory
available. As the number of committed bytes grows, the paging
file is allowed to grow up to its maximum size, which can be
determined by subtracting the total physical memory on the
system from the commit limit. If you set the initial paging file
size too small, the system will repeatedly extend the paging
file, and this requires system resources. It is better to set
the initial page size as appropriate for typical usage or simply
use a fixed paging file size. For a fixed paging file, set the
size to at least two times the size of RAM.
-
Memory\Page
Faults/Sec Records the average number of page faults per second. It
includes both hard and soft page faults. Soft faults result in memory
lookups. Hard faults require access to disk.
-
Memory\Pages/Sec
Records the number of memory pages that are read from disk or written to
disk to resolve hard page faults. It is the sum of Memory\Pages
Input/Sec and Memory\Pages Output/Sec.
-
Memory\Pages
Input/Sec Records the rate at which pages are read from
disk to resolve hard page faults. Hard page faults occur when a
requested page isn’t in memory and the computer has to go to
disk to get it. Too many hard faults can cause significant
delays and hurt performance.
-
Memory\Pages
Output/Sec Records the rate at which pages are written to
disk to free up space in physical memory. If the server has to
free up memory too often, this is an indicator that there isn’t
enough physical memory (RAM) on the system.
-
Memory\Pool Paged
Bytes Represents the size in bytes of the paged pool. The paged pool is
an area of system memory for objects that can be written to disk
when they aren’t used. If the size of the paged pool is large
relative to the total amount of physical memory on the system,
you might need to add memory to the system. If this value slowly
increases in size over time, a kernel-mode process might have a
memory leak.
-
Memory\Pool Nonpaged
Bytes Represents the size in bytes of the nonpaged
pool. The nonpaged pool is an area of system memory for objects
that can’t be written to disk. If the size of the nonpaged pool
is large relative to the total amount of virtual memory
allocated to the server, you might want to increase the virtual
memory size. If this value slowly increases in size over time, a
kernel-mode process might have a memory leak.
-
Paging
File\%Usage Records the percentage of the paging file currently in use. If
this value approaches 100 percent for all instances, you should
consider either increasing the virtual memory size or adding
physical memory to the system. This will ensure that the server
has additional memory if it needs it, such as when the server
load grows.
-
Paging File\%Usage
Peak Records the peak size of the paging file as a percentage of
the total paging file size available. A high value can mean that
the paging file isn’t large enough to handle increased load
conditions.
-
Physical Disk\%Disk
Time Records the percentage of time that the selected
disk spent servicing read and write requests. Keep track of this value
for the physical disks that have paging files. If you see this
value increasing over several monitoring periods, you should
more closely monitor paging-file usage and you might consider
adding physical memory to the system.
-
Physical Disk\Avg Disk Queue
Length Records the average number of read and write
requests that were waiting for the selected disk during the
sample interval. Keep track of this value for the physical disks
that have paging files. If you see this value increasing over
time and the Memory\Page Reads/Sec is also increasing, the
system is having to perform a lot of paging-file
reads.
-
Physical Disk\Avg Disk
Sec/Transfer Records the length in seconds of the average
disk transfer. Track this value for the physical
disks that have paging files in conjunction with Memory\Pages/Sec. Memory\Pages/Sec tracks the
number of reads and writes for the paging file. If you multiply
the Physical Disk\Avg Disk Sec/Transfer by the Memory\Pages/Sec
value, you have an excellent indicator of how much of the disk
access time is being used by paging. Use the result to help you
decide whether to move the paging files to faster disks or add
physical memory to the system.
Resolving processor bottlenecks
After you’ve eliminated memory as a potential bottleneck, you
should examine the system’s processor usage to determine whether
there are any potential bottlenecks. Processor bottlenecks can occur if a process’s threads
need more processing time than is available. This, in turn, causes
the processor queue to grow because threads have to wait to get
processing time. As a result, the system response suffers and the
system appears sluggish or nonresponsive.
Excess interrupts are another common reason for processor
bottlenecks. Each time drivers or disk subsystem components, such as
hard disk drives or network components, generate an interrupt, the
processor has to stop what it is doing to handle the request because
requests from hardware take priority. However, poorly designed
drivers and components can generate false interrupts, which tie up the processor for no
reason. System boards or components that are failing can generate
false interrupts as well.
If a system’s processors are the performance bottleneck,
adding memory, drives, or network connections won’t overcome the
problem. Instead, you might need to upgrade the processors to faster
clock speeds or add processors to increase the server’s upper
capacity. You could also move processor-intensive applications, such
as Microsoft Exchange Server, to another server.
Counters you can use to check for processor bottlenecks include the following:
-
System\Processor Queue
Length Records the number of threads waiting to be
executed. These threads are queued in an area shared by all
processors on the system. If this counter has a sustained value
of 10 or more threads, you might need to upgrade the processors
to faster clock speeds or add processors to increase the
server’s upper capacity.
-
Processor\%Processor
Time Records the percentage of time the selected
processor is executing a nonidle thread. You should track this counter
separately for all processor instances on the server. If the
%Processor Time values for all instances are high (above 75
percent) while the network interface and disk input/output (I/O)
throughput rates are relatively low, you might need to upgrade
the processors to faster clock speeds or add processors to
increase the server’s upper capacity.
-
Processor\%User
Time Records the percentage of time the selected
processor is executing a nonidle thread in User mode. User mode is a
processing mode for applications and user-level subsystems. A
high value for all process instances might indicate that you
need to upgrade the processors to faster clock speeds or add
processors to increase the server’s upper capacity.
-
Processor\%Privileged
Time Records the percentage of time the selected
processor is executing a nonidle thread in Privileged mode. Privileged
mode is a processing mode for operating system
components and services, allowing direct access to hardware and memory. A high value for all
processor instances might indicate that you need to upgrade the
processors to faster clock speeds or add processors to increase
the server’s upper capacity.
-
Processor\Interrupts/Sec
Records the average rate, in incidents per
second, that the processor received and serviced hardware interrupts. Compare this value to your
baselines. If this value changes substantially (I mean by
thousands of interrupts) without a corresponding increase in
activity, the system might have a hardware problem. To resolve
this problem, you must identify the device or component that is
causing the problem. Start with devices that have drivers you’ve
updated recently.