Before
diving into the details, let's discuss a few concepts related to
performance. The first thing you should know is that achieving high
performance is difficult. Although making sure applications perform to
acceptable levels is important, advanced performance tuning requires
careful planning and should be included as a design goal only if
requirements drive you to believe that high performance is necessary.
For example, if you expect your application to be used by thousands of
concurrent users, then you may need to use caching and even
multithreading. On the other hand, certain high-performance techniques
can make code difficult to read and maintain, and in such cases
knowledge transfer may be difficult.
1. Chatty vs. Chunky
The encrypted network
connection to SQL Azure yields slower applications and may impact your
application design significantly. An application that opens a database
connection for every database call and performs a roundtrip for every
update (that is, a chatty
application) performs slower than an application that loads data for
multiple objects in a single call and sends changes in bulk (a chunky
application). LINQ to SQL and the Entity Framework are data access
layers that provide good control over the use of bulk operations (the SaveChanges method on the object context).
For example, if you
design a data access layer that contains a lot of business rules, your
code may perform many roundtrips to the database to load the data needed
to execute the business rules. If this is the case, you can implement
certain data-intensive business rules in stored procedures (close to the
data) and/or use caching to avoid unnecessary roundtrips.
2. Lazy Loading
On the other hand, although
it's good to have fewer roundtrips from a performance standpoint, you
should load only the data you need, for two reasons: the more data you
load, the more you pay for the SQL Azure service; and loading more data
than necessary can slow down your applications. So, you may want to
consider using a lazy loading
mechanism by which certain properties of your objects are loaded only
when necessary. LINQ to SQL and the Entity Framework 4.0 support lazy
loading (through the use of the DeferredLoadingEnabled property).
Although lazy loading
minimizes the amount of data loaded, it also creates a chattier
application by design. It's important to strike the right balance
between using bulk data transfers and minimizing the amount of data
needed to run an application function.
3. Caching
Another important technique used to minimize roundtrips is caching.
Your application (or service) may use caching to avoid unnecessary
roundtrips if some of your data doesn't change often. This may also
impact your database design choices. For example, if you have a table
that stores a list of states, the table will probably remain unchanged
for a long time, which makes it a great candidate for caching.
Caching can be performed in memory or on disk (in a local database, for example). You have a few options:
ASP.NET caching.
ASP.NET offers a cache object that provides good caching capabilities.
However, ASP.NET caching is tied to IIS. Restarting IIS clears the
ASP.NET cache unless you've taken the necessary steps to persist the
cache.
Windows Server AppFabric.
The AppFabric offers a next-generation distributed cache (previously
known as Velocity). This cache can run on multiple computers and is made
available through a .NET API.
Enterprise Library.
The Enterprise Library offers a collection of application blocks that
Microsoft makes available under public license. The Enterprise Library
contains a cache mechanism that doesn't depend on ASP.NET. This caching
mechanism is provided natively in .NET 4.0 and can be found under the System.Runtime.Caching namespace.
4. Asynchronous User Interface
Ultimately, performance
is a measure that impacts the user experience and can be controlled to a
certain degree by offering highly responsive user interfaces. A Windows
application that becomes unresponsive while loading data, or a web page
that doesn't load until all the data has been retrieved, is perceived
as slow. As a result, developing with multithreading techniques may
become more important to provide a better experience to your users.
For web
development, you should consider using asynchronous controls (such as
AJAX) that give you more control over partial page loading. For Windows
development, you may need to use a multithreaded user interface
development approach.
To implement a highly responsive application in WinForms, use the Invoke method, shown on line 3 of the following example, to refresh your user interface on the UI thread:
1) void OnPassCompleted()
2) {
3) this.Invoke(new EventHandler(UpdateProgressBar), null);
4) }
5)
6) private void UpdateProgressBar(object o, System.EventArgs e)
7) {
8) if (progressBarTest.Value < progressBarTest.Maximum)
9) {
10) progressBarTest.Value++;
11) }
12) }
In this example, OnPassCompleted is a custom event received by the main form, which then calls the Invoke method to refresh a progress bar. The call to InvokeOnPassCompleted event was raised. forces the execution of the progress bar refresh on the UI thread, which is different than the thread on which the
5. Parallel Processing
In addition to
asynchronous user interfaces, your code may need to execute on multiple
processors. Two primary scenarios can lead you to choose parallel
processing for your application:
Many calculations.
Your application is CPU intensive, especially if computations can be
independent from each other. Advanced graphics or complex mathematical
computations are examples of CPU-intensive operations.
Many waits.
Your application needs to wait between each call, and the cost of
creating parallel threads and aggregating results is insignificant.
Database shards are an example: calling five databases in parallel is
roughly five times faster than calling five databases serially.
Two choices are available to
write parallel processes. If you can, you should use the Task Parallel
Library (TPL), because it's easier:
Task Parallel Library.
The TPL is a newer library that Microsoft is providing as part of .NET
4.0. It allows you to take advantage of multiple CPUs quickly and
easily. You can find the TPL under System.Threading.Tasks.
Threads. Managing threads the old-fashioned way using the System.Threading namespace gives you the most flexibility.
6. Shards
Shards offer another mechanism
by which your code can read and write data against any number of
databases almost transparently. A horizontal partition implies that
all the databases have identical schema and that a given record can
written in any database that belongs to the shard. From a performance
standpoint, reading from multiple databases in parallel to search for
records yields greater performance; however, your code must keep
breadcrumbs if you need to perform updates back to the correct database.
Finally, using a shard requires parallel processing for optimum
performance.
7. Coding Strategies Summary
Table 1. Coding Strategies to Design for Performance
Technique | Comments |
---|
Bulk data loading/changing | Minimizes roundtrips by using a data access library that supports loading data in bulk, such as the Entity Framework. |
Lazy loading | Allows
you to create objects for which certain properties are loaded only when
first called, to minimize loading unnecessary data (and improve
performance). The Entity Framework 4.0 supports this. |
Caching | Lets
you to keep in memory certain objects that don't change frequently. The
caching application blocks provided by Microsoft offer this capability
as well as expiration and scavenging configuration settings. |
Asynchronous user interface | Not
technically a performance-improvement technique, but allows users to
use the application while your code is performing a long-running
transaction and thus provides a better user experience. |
Parallel processing | Allows
you to run code on multiple processors for optimum performance.
Although complex, this technique can provide significant performance
advantages. |
Shards | Lets you to store data in multiple databases to optimize reads and spread the load of queries over multiple database servers. |