Creating a Design Document
When you have completed the business analysis and
planning process, you will need to translate the business needs to
native SharePoint search functionality. Create a design document that
contains sections that closely map to the SharePoint configurations.
The design document is valuable because it not only represents what you
need to do to build your solution, but it also doubles as an
administrator document, which the support staff can reference and use
to better understand how search has been configured.
While the process of designing the functional
requirements of a SharePoint search solution initiate from the
business-user perspective, the approach taken in configuring search
runs a completely opposite course. When configuring, you should start
with back end and gradually make your way to the user experience at the
end of the process. First in the configuration sequence is the
infrastructure, where you configure the necessary storage and
processing resources. Next, you move into SharePoint Central
Administration where you configure the farm-wide settings and server
topology. Next, you go into the Search Service Application and
configure search settings. Finally, you move into the Site Collection
where you configure
subsites, pages, Web Parts, and the navigation. The design document
should flow in the direction of the configuration, from the back end
forward to the end-user experience.
To effectively design and document a search
solution, you must be very familiar with configuration steps and have
in-depth understanding of the business requirements. The design
document merges these knowledge areas into one cohesive artifact, which
is why it is a critical component to the overall approach. Table 2 through Table 6
provide a starting point for creating a SharePoint search design
document. These tables resemble the actual configuration screens within
SharePoint, which make the process of configuring search a whole lot
easier. And by having the configuration documented upfront, you can
race through the configuration while making only minor adjustments to
the document as you go, rather than trying to write a document from
scratch and configure at the same time—or worse, try to backfill the
documentation later.
Table 2. How Unique Content Sources May Be Configured
Content Sources |
---|
Title | Type | Start Address |
---|
Local Office SharePoint Server Sites | SharePoint sites | https://centraladmin.domain.com/ |
People | SharePoint sites | https://mysite.domain.com/
sps3s://mysite.domain.com/ |
Corporate Portal | SharePoint sites | https://inside.domain.com/ |
Extranet Portal | SharePoint sites | https://outside.domain.com/ |
Search Portal | SharePoint sites | https://search.domain.com/ |
Public Web Site | Web sites | http://www.domain.com/ |
Table 3. Crawl Schedule for Each Content Source
Crawl Schedules |
---|
Content Source | Full Crawl | Incremental Crawl |
---|
Local Office SharePoint Server Sites | Not Scheduled | Not Scheduled |
People | At 2:00 AM on 15th of every month | Every 12 hour(s) from 6:00 AM for 12 hour(s) every day |
Corporate Portal | At 2:00 AM on 15th of every month | Every 5 minute(s) from 6:00 AM for 12 hour(s) every day |
Extranet Portal | At 2:00 AM on 15th of every month | Every 5 minute(s) from 6:00 AM for 12 hour(s) every day |
Search Portal | Not Scheduled | Not Scheduled |
Public Web Site | At 4:00 AM on 15th of every month | Every 12 hour(s) from 6:00 AM for 12 hour(s) every day |
Table 4. Document Your Service Account Credentials
Crawl Rules |
---|
URL | Include or Exclude | Service Account |
---|
*://centraladmin.domain.com/* | Exclude | |
*://search.domain.com/* | Exclude | Refer to Service |
*://*/*brokensites.aspx | Exclude | Account |
*://*/*rejectedsites.aspx | Exclude | Documentation |
*://*allitems.aspx* | Exclude | |
*://*allforms.aspx* | Exclude | |
Table 5. Metadata Property Mapping Illustrates How Managed Properties Are Configured to Map to One or More Crawled Properties
Metadata Property Mappings |
---|
Managed Property | Type | Use in Scopes | Include Values from | Crawled Property | Include in Index |
---|
| Integer | Yes | Single | SharePoint:isdocument (Integer) | Yes |
| | | | SharePoint:isdocument (Integer) | Yes |
CustomIsDocument | | | | Basic:22(Integer) | Yes |
Created | Date and Time | No | Single | Basic:15(Date and Time)
Office:12(Date and Time) | Yes
Yes |
| Text | Yes | Single | Ows_Created_x0020_By (Text) | Yes |
| Text | | | Office:4(Text) | Yes |
Created By | Text | | | Mail:6(Text) | Yes |
| Text | Yes | Single | FileExtension(Text) | Yes |
| | | | Ows_FileType (Text) | Yes |
Fileextension | | | | Ows_File_x0020_Type (Text) | Yes |
Filename | Text | Yes | Single | Basic:10(Text) | Yes |
Table 6. You May Require Numerous Scopes, Each Having Several Rules
Each—It Is Important to Capture These Settings in a Document
Scopes |
---|
Scope Name | Excel | |
Target Results Page | Default | |
Rules |
Scope Rule Type | Value | Behavior |
Property Query | FileExtension = xls | Include |
Property Query | FileExtension = xlsx | Include |
Property Query | FileExtension = xlt | Include |
Property Query | FileExtension = xlsm | Include |
Planning
To understand what is required to configure search,
you may benefit from first breaking this topic into manageable parts.
Search configurations occur at the following areas:
Server/infrastructure
Central Administration, Farm-wide Settings
Central Administration, Search Service Application
Site Collection
Following the business analysis activities, you
should have a good idea what content sources need to be included in the
search solution. Use the content source information to plan for
capacity. Investigate each content source, calculating the total size
of the content in each source. Keep track of the current size as well
as an estimated size using future milestone dates. When calculating the
size of content stored within SharePoint Web applications, keep in mind
that SQL Server content databases consume more space than just the size
of the content itself, as database files (.mdf) and log files (.ldf)
require additional overhead. With this information you can begin the
technical analysis and design of the search infrastructure, including
defining the following requirements:
Accessibility requirements
Capacity and storage requirements
System performance requirements (remember to monitor the systems being crawled)
Hardware requirements (servers, storage, processor, memory, network)
Service Level Agreements (SLA) and availability requirements
Disaster recovery
SharePoint search is a resource-intensive service,
and so hardware needs to be sized accordingly. Storage requirements can
vary greatly, depending on the size of the SharePoint farm, amount of
content being crawled, nature of the content, as well as the use of
properties. It is advisable to refer to online resources, such as
Microsoft TechNet (http://technet.microsoft.com) or MSDN Blogs (http://blogs.msdn.com) for credible guidance on estimation approaches for infrastructure components. Table 7 provides some detail on capacity planning requirements.
Table 7. Storage Estimates for Capacity Planning
Content Storage Estimates |
---|
Title | Current Size | Estimated Size Now +1 Year | Estimated Size Now 3 Years |
---|
Local Office SharePoint Server Sites | 100 MB | 100 MB | 100 MB |
People | 500 MB | 500 MB | 500 MB |
Corporate Portal | 2 GB | 4 GB | 6 GB |
Extranet Portal | 4 GB | 5 GB | 6 GB |
Search Portal | 100 MB | 100 MB | 100 MB |
Public Web Site | 500 MB | 500 MB | 500 MB |
Space Required for Additional | 5 GB | 10 GB | 20 GB |
Content Source TBD | | | |
Total Corpus | ~12 GB | ~21 GB | ~34 GB |
Beyond the crawled content, you must plan to allocate storage for the search service application components. These include
The inverted index files (located on index servers)
Propagated index files (located on query servers)
Search Service Application Crawl Store database (located on the database servers)
Search Service Application database
Search Service Application Property Store database
SharePoint WSS Search database
SQL Server Temp database
There are several components that make up SharePoint
search topology. Among these include an Administration Component, a
Crawl Component, Index, Databases, and a Query Component. With respect
to servers in a server farm, SharePoint search components may be
configured to run together on one system or spread apart on separate
systems. This topology is defined within the SharePoint Central
Administration, Modify Topology panel and must be configured prior to
using search.
The optimal topology configuration depends on many
variables. Hardware resources such as storage, CPU, disk I/O, memory,
and network play an important role in planning the topology. Other
factors might include available hardware, budget for future hardware,
number of users, geographical characteristics, performance
requirements, amount of content being crawled, nature of the systems
being crawled, operational responsibility assignments, high availability requirements, and disaster recovery requirements.
The topology can adapt and scale as needs
change. For example, to scale out from a single server topology, a
general rule of thumb is that the SQL Server role would be the first
component to separate onto its own server. Next, the indexing engine as
it is processor-intensive, leaving only the Web Front End role and
Query Server role remaining on the original server. Next, another
SharePoint Web front end can be added and optionally dedicated for
crawling purposes so that the users are not competing with the crawler
for system resources. The environment can continue to scale by adding
hardware nodes and reconfiguring the topology. As with capacity and
performance planning, the infrastructure planning does require a
detailed analysis of the environment because every situation is
different.