Content Services

Content Governance and Lifecycle Management

Content governance and lifecycle management are the policies, practices, and procedures that ensure that an organization’s available content is timely, accurate, and relevant. Content governance helps create trust in the information so that employees can use it with confidence.

Business Value

A strong and well-managed program for content governance and content lifecycle management is key to ensuring that content shared across the organization is valuable and relevant to end users.

Deliverables

Iknow’s primary deliverables for Content Governance and Lifecycle Management include:

  • Content governance strategy, which includes detailed descriptions of how each major content type is handled, from creation through expiration. Specifically, this description, or map, would show who creates it, how the content is approved for use, and who the intended content consumers are.
  • The roles and resources required to implement content governance policies and practices.
  • A risk mitigation plan that addresses security, privacy and confidentiality, and other concerns that relate to the safeguarding of content.
  • Implementation assistance.

External Content Strategy, Sourcing, and Rights Management

Iknow offers a full set of services to help organizations evaluate, acquire, and manage their external content sources.

Business Value

Making sure you have the right flow of external information into your organization is critical to success. Too much and you become paralyzed (and waste resources); too little and you make uninformed decisions.

Deliverables

Iknow offers a full suite of services around external content acquisition, rationalization, and maintenance, including:

  • External content strategy.
  • Source identification—finding and evaluating new sources of content.
  • Source rationalization—helping organizations optimize their spending on the “right” content sources.
  • Contract negotiation—helping you get the best prices and contract terms.
  • Business process design/redesign and optimization. Making sure the content reaches and is used by decision makers.
  • Implementation assistance.

Content Strategy Development

Because much of an organization’s knowledge is codified in the form of content, the content strategy is a core element of any good knowledge management program. The business goal of a content strategy is to put the right content into the hands of the right people to help them make better decisions and work more effectively.

Business Value

A content strategy is essential for successfully managing an organization’s content. Without a clearly defined content strategy, the organization will likely end up with content silos, ineffective search, and limited knowledge sharing.

Deliverables

Iknow’s primary deliverable is a content-strategy document. Depending on the project scope, it could include a detailed roadmap for content-related improvements and preliminary requirements for items such as content review/approval processes, classification, archiving, and security.

Iknow also offers assistance for implementing a content strategy.

Content Enrichment and Search Enhancement

Subtitle
Adding Metadata and Tuning the Search Engine to Improve Information Findability

The Wildland Fire Lessons Learned Center (WFLLC), headquartered in Tucson, Arizona, serves the wildland fire community by providing them with a single reference repository for knowledge about optimally fighting wildfires. The WFLLC supports more than one million firefighters nationally, according to the U.S. Fire Administration.

Approach

The primary objectives of this project were:

  • Design a comprehensive taxonomy and a set of WFLLC-specific content metadata tags.
  • Store the new taxonomy and metatags in Smartlogic Semaphore. Semaphore is an enterprise semantic platform that augments traditional information management systems (such as search, content management systems, and business workflow engines) by adding advanced content classification, metadata, and navigation capabilities to deliver a more complete enterprise information management experience.
  • Develop a new content repository in Microsoft SharePoint, a content management system (CMS).
  • Migrate the WFLLC’s content from the current websites to SharePoint, remove old or dated content, assign the new metatags to the WFLLC’s content, and store both the articles and their associated metadata together in the new CMS.
  • Configure the Coveo search engine to incorporate the new taxonomy, metadata, and content in the SharePoint CMS and implement the new functionality on the existing WFLLC websites.

Iknow performed the assignment in two major work streams: (1) Content Enrichment; and (2) Search Engine Enhancement. The key steps were:

  • Content Enrichment Work Stream
  1. Project Preparation and Initiation
  2. Analyze the Existing Content and Perform Initial Processing
  3. Develop the WFLLC Taxonomy and Metadata
  4. Design the Content Tagging Process
  5. Implement the Content Tagging Process
  • Search Engine Enhancement Work Stream
  1. Validate the Coveo Enterprise Search (CES) Installation
  2. Design the CES Interfaces
  3. Implement the New UI Designs and Search Tuning

Commercial software products from three product vendors were used to create the overall solution. The products and short descriptions of their functionality are provided below.

  • Smartlogic Semaphore software was used for ontology model development and management, ontology-driven classification, and browsing search results. The specific software products purchased from Smartlogic were:
    • Ontology Manager—designed to allow multiple users to create, enhance, and browse all types of semantic models, whether they are lists, controlled vocabularies, taxonomies, thesauri, or ontologies. The software covers the lifecycle of taxonomy development and maintenance. The license includes unlimited semantic visualization (SV) web clients (i.e., unlimited use on the WFLLC’s websites).
    • Semaphore’s Semantic Enhancement Server (SES)—a high-speed XML-based index that allows developers to query an ontology or taxonomy in real-time and create and deliver topic maps, faceted search, visualization, topic pages, related content, and other user interface components.
    • Text Miner—used to automatically extract nouns, noun phrases, and other entity types from unstructured text content. The Advanced Language Pack provides entity extraction capability for a specific language. The English language was licensed for use by the WFLLC.
    • Content classification is the process of analyzing a document and adding metadata “tags” that describe that document. Metadata tags are sourced from a taxonomy or other form of controlled vocabulary. Modules of Semaphore Content Classification and Text Mining Server include:
      • Classification Server. The enterprise scalable classification and text analysis processing engine.
      • Rule and Template Editor. A client tool to generate the rule base templates and build custom rules.
      • Rulebase Generator. The processing stream that generates the rule bases from the Semaphore model.
    • Semaphore for SharePoint is a comprehensive integrated solution for Microsoft Office SharePoint Server 2007 or SharePoint 2010. This connector extends SharePoint by tightly integrating Semaphore’s automatic classification and taxonomy governance capabilities with SharePoint’s content management functionality. 
  • Two Microsoft products were purchased and used in the Content Enrichment solution.
    • Microsoft SharePoint is a web application platform. SharePoint provides a central location for storing content such as files and documents. This content can be accessed and modified within a web browser or by using a client application (typically Microsoft Office) via desktop or smartphone. SharePoint 2010 provides a concurrent edit ability with Office 2010.
    • Microsoft SQL Server is a relational database server. Microsoft SQL Server’s primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network (including the Internet).
  • Coveo Enterprise Search was used on the WFLLC websites. The Search Enhancement portion of this project improved the search functionality of the Coveo product. Two Coveo products were used in the Content Enrichment solution.
    • Coveo Enterprise Search, version 6.5. Coveo Enterprise Search is a modular and scalable enterprise search platform that indexes information stored in various repositories throughout the enterprise.
    • SharePoint Connector. Coveo’s SharePoint Connector is one of the best ways to integrate information stored in SharePoint with other information in Coveo’s search index. Coveo supports multiple SharePoint versions, including SharePoint 2010.

The solution was developed on Amazon Web Services and Server Intellect, a third-party hosting provider. The overall technical architecture of the Content Enrichment and Search Enhancement Project is illustrated in the exhibit below. The exhibit shows the three types of functionality that are integrated together in the overall solution:

  • Taxonomy and Classification, provided by Smartlogic
  • Content Management and User Interface, provided by Microsoft
  • Search, provided by Coveo.
Content Enrichment/Search Enhancement Solution

Exhibit illustrating the content enrichment and search enhancement solution.

The four arcs highlight the integration between these products:

  1. The Semaphore taxonomy model and classification rules are used to tag the content in the SharePoint content repository.
  2. The Coveo SharePoint connector accesses the SharePoint content repository during content indexing.
  3. The Semaphore SharePoint connector provides enhanced search and browse functionality.
  4. The Coveo SharePoint connector provides enhanced search and facet functionality.
Results

The WFLLC received a new taxonomy for the wildland fire domain and all of the WFLLC’s content was richly tagged with appropriate metadata. The Coveo search engine was reconfigured to incorporate the new taxonomy and metadata. Several new search options were implemented, including keyword search, faceted search, advanced search, and browse options. The search results ranking algorithms were optimized to provide more accurate search results.

Project Summary No.
126

Analyzing Scientific and Technical Documents for Enterprise Search

Subtitle
Applying Auto-Categorization Functionality at an International Biotechnology Company

One of the world’s leading biotechnology companies was evaluating commercial enterprise search products. One of their key product requirements was auto-categorization, but the company didn’t have the expertise internally to evaluate the available commercial search products. What they wanted to do was compare the auto-categorization results from the search engine products under consideration with the auto-categorization results from a leading text analysis product. In addition, the company wanted to use the auto-categorization methodology for classifying new documents in the future.

Approach

Iknow used the SAP BusinessObjects Text Analysis Suite, with its superior entity extraction and categorization capabilities, to analyze the company’s content. Iknow was selected because of its deep technical knowledge and experience with many of the leading enterprise search and text analysis products.

Iknow was given more than 50,000 scientific and technical documents drawn from three sources—the Defense Technical Information Center (DTIC), the U.S. Department of Energy (DOE), and the U.S. National Library of Medicine’s PubMed database. Each of the three data sets included a source-specific taxonomy, full text documents, and the document abstracts. The size of the input data exceeded 75 GB.

The processing and analysis was performed using SAP Business Objects Text Analysis XI 3.0, with the embedded Oracle XE database, Categorizer Workbench, and ThingFinder Workbench tools. The Categorizer Workbench provides an editorial environment for creating and maintaining taxonomies and contains both a learn-by-example (LBE) algorithm and a rules-based engine. The Thingfinder Workbench provides advanced text analysis that automatically identifies and extracts entities from any text data source.

Fifteen separate analyses were performed on the 50,000-plus document dataset, including various taxonomy creation and auto-categorization tasks. The Categorizer Workbench was able to classify the content into the PubMed taxonomy and a proprietary taxonomy with greater than 95 percent accuracy. The Categorizer LBE algorithm automatically generated a categorization rule set that could be reused, which met the company’s requirement for a reusable methodology.

Results

Iknow provided all of the information requested by the biotechnology company and the company made an informed purchase of a new enterprise software product.

Iknow also recommended that the company purchase a text analysis product and integrate it with the enterprise search tool to create an end-to-end automated content acquisition, tagging, and indexing process. The text analysis software would enhance the enterprise search tool by providing entity extraction, automated summarization, and auto-categorization functionality.

Project Summary No.
107

Content Audit

Iknow uses an approach called a “Content Audit” to uncover the full breadth of an organization’s content—where it is stored, how it is organized, and determine its role and value in business decision making. We typically cast a wide net to identify content in file stores and central repositories, content management systems, digital asset management systems, on personal hard drives, in a variety of paper formats located in employees’ offices, and in centralized corporate libraries.

Business Value

A Content Audit is usually the first step in addressing an organization’s desire to “get their arms around” the explosion of business content. Companies cannot make intelligent decisions about how to manage their content without having a thorough inventory of their content and an assessment of its value.

The information in a Content Audit can be used as the basis for making strategic decisions regarding enterprise knowledge stewardship and operational decisions such as content cleansing and content enrichment. The Content Audit also is the foundation for risk mitigation related to content. For example, using out-of-date content can result in costly mistakes, so dated content should be purged. On the other hand, legacy content may need to be saved to comply with government regulations and, therefore, the disposition of the content must be carefully supervised and controlled.

Deliverables

Iknow’s primary deliverable is a Content Audit report. This report often includes:

  • Current-state content assessment
  • Future-state projections of enterprise content growth rates and future-state content management systems architecture
  • Content management improvement recommendations.