-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Optimize SharePoint Storage
Ensure Governance, Performance and Scalability

Microsoft SharePoint Server 2010, the latest release of Microsoft’s best-selling server product, is poised to revolutionize how organizations worldwide connect their people, processes and information. Its release marks SharePoint’s evolution from a server application into a full-fledged platform—the world’s first “virtual ecosystem” for enterprise-class for collaboration, development and delivery.

When adopting SharePoint, most organizations are seeking to deploy a single platform upon which to unify the presentation, management and findability of their digital assets and work processes. However, in order to meet this objective, key stakeholders must have a good understanding of SharePoint’s storage architecture, and how this architecture can affect governance plans and the bottom line. They must develop strategies and adopt tools that enable them to take advantage of all of SharePoint’s collaborative and information management capabilities, while addressing the financial and logistical challenges SharePoint’s architecture might give rise to. This article is the first step toward this process. Its goal is to review the basic storage architecture and content management capabilities of SharePoint, expose potential pitfalls an organization might face as a result and introduce strategies and tools to help overcome these challenges.

SharePoint Storage Essentials
SharePoint uses a unified storage infrastructure that utilizes a SQL Server database. While SQL is a superior database technology, its use as SharePoint’s backend can pose unique challenges for organizations that are looking to centralize the presentation and management of terabytes’ worth of data.

Why? Well, first off, SQL is a relatively expensive tier 1 storage media compared to file- and cloud-based storage. It can be quite costly—both financially and logistically—to migrate legacy content to SQL so that it can be managed and presented via SharePoint.

Second, SQL’s performance can be compromised when burdened with the unstructured, non-relational data that organizations typically want to upload to SharePoint, such as Word documents, Excel files, PDFs, PowerPoint slides, audio files, video clips, etc. (These unstructured data are known as binary large objects, or BLOBs). This creates a “catch 22” of sorts: On the one hand you want to keep all digital assets in SQL, so they may be collaborated upon, presented, managed, searched for and governed via SharePoint; on the other hand, placing it in SQL reduces Sharepoint’s performance and costs a lot of money (both to get it there, and to keep it there).

To address the issue systematically, let’s start by breaking out the various types of content you want to manage via SharePoint into three categories. Afterwards, we can discuss how each can be dealt with efficiently.

Unstructured data: This data type represents 95% of the content a typical organization uploads into SharePoint. All of the Word files, Excel files, CAD files, PDFs, etc. end-users collaborate upon everyday are unstructured datasets. As we mentioned, a relational database like SQL is not particularly good at handling this unstructured data.

Legacy data: This data type represents all of the digital content an organization have maintained in other systems, be they legacy databases, in the cloud or on disparate file systems. If an organization wants to present and manage its legacy content via SharePoint, it needs to understand the pros and cons of its options, including migration of the data to SQL, or—for legacy structured data—using SharePoint 2010’s native Business Connectivity Services (BCS) to surface and manage the data without migration. The BCS only deals with structured data and doesn’t allow for the management of content residing in places like file shares, network drives or file-based cloud storage. For this type of legacy data, another solution must be found.

Inactive data: This represents all of the SharePoint data—be it a document, a site collection, a list, anything—that once was actively being leveraged by end-users, but is now no longer being used, that must be maintained for compliance, legal, search or retention policy purposes.

Unstructured Content—Where should all those BLOBs be stored?
Considering what we already discussed about SQL and BLOBs, the fact that more than 95% of the data a typical organization uploads into SharePoint are BLOBs makes it obvious that SharePoint does not do wonders for SQL database performance. BLOBs result in longer index times, slower response to SharePoint end-users and overall platform performance degradation. Microsoft tried to address this issue by publishing the External BLOB Store (EBS) Provider API with Microsoft Office SharePoint Server (MOSS) 2007 SP2, and introducing the Remote Blob Store (RBS) Provider API with SQL Server 2008, both of which enable organizations to leverage a stub-based system to extend SharePoint storage to other media. However, both the EBS and RBS Providers are simply APIs and require significant coding to be effectively utilized.

The good news is that—leveraging one of these APIs—an organization can offload all BLOBs from their SQL database, without affecting end-user experience or the findability of content at all. All of SharePoint’s BLOBs can reside on disk-based storage, yet remain fully accessible via SharePoint—thereby easing the burden on SQL, optimizing platform performance enabling the organization to utilize existing storage infrastructure and delivering instant ROI (through reduced SQL costs).

Legacy Content
Many organizations have volumes of legacy content stored on myriad file shares, legacy databases and other storage devices. For governance and productivity, organizations would like to unify management and presentation of this legacy data within SharePoint. However, for financial, compliance or logistical reasons, many companies do not necessarily want to migrate this data into their SharePoint environments, and often for good reason. Firstly, getting this content into SQL can be costly and time-consuming, even with the use of a great migration tool and proper planning. But even more importantly, this content adds a substantial load burden to SQL, resulting in the same issues we deal with when speaking about BLOBs. (Indeed, most of this legacy content will likely be BLOBs, too.)

For structured legacy data, (i.e. content residing in legacy databases), Microsoft introduced the Business Data Catalog (BDC) with its release of MOSS 2007. The BDC provided IT administrators with a way to present/surface business data from back-end database content, such as from SAP or Siebel, within MOSS without writing any code. With SharePoint Server 2010, Microsoft revamped the BDC and gave it a new name: The Business Connectivity Services (BCS). BCS expands on the BDC’s functionality by enabling read and write access to external systems—Web services, databases and Microsoft .NET Framework assemblies—from within SharePoint 2010 and Microsoft Office 2010 applications. However, while developers can now use SharePoint Designer 2010 and Visual Studio 2010 to access external data via the BCS, the BCS can’t deliver the same functionality to legacy unstructured content, such as content residing in file shares, network drives, FTP sites and cloud stores. If the goal is to leverage SharePoint as the single management and presentation layer for all enterprise content, then surfacing and managing content residing in file shares—without having to migrate it to SQL—is a critical capability.

Inactive Content
Now let’s discuss all the inactive or dormant content. What we classify as “dormant” content is anything the company and its end-users are no longer actively using, but must remain accessible for compliance, retention and legal purposes. The challenge here is that dormant content grows quite quickly within SharePoint (imagine how many team sites have been built for projects long ago completed, how many documents are no longer in active use, how active are the first 50 versions of a Power Point presentation that now has 100 versions, etc.).

An appropriate strategy would be to leverage the APIs we discussed earlier to offload this dormant content to tiered storage resources, while remaining seamless to the end-user, so that he or she may search for it and collaborate upon it, if need be. An intelligent strategy would automate this process according to pre-established business rules, whereby content that had not been modified or viewed after a certain period of time would be delivered to progressively lower tiers of storage. This enables the organization to further optimize its SQL resources, ensure unfettered access to archived content, take full advantage of existing storage architecture, implement intelligent content lifecycle management on all SharePoint content and save money from day one. Unfortunately, there is no native archiving or content lifecycle management capability in SharePoint Server 2010 to help with this. While the platform’s latest release offers a greatly enhanced records center, the key here is that when any SharePoint content is declared a record, the content does not move to an archive. Rather, it remains in SQL, taking up valuable space, reducing performance of the platform and not enabling the utilization of existing tiered storage investments.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues