Sunday, December 20, 2009

Introduction to SharePoint Managed Metadata

I have the day off and as I ate my brown sugar Mini-Wheats, I naturally came to the conclusion that today would be the day that I would finally begin to write a series of blog posts about the biggest new SharePoint 2010 feature: Enterprise Managed Metadata (a.k.a. EMM, Managed Metadata Service (MMS) or SharePoint 2010 taxonomy).

image

Update: These are the posts currently in this series:

SharePoint Taxonomy Part One – Introduction to SharePoint Managed Metadata
SharePoint Taxonomy Part Two – End-User Experience
SharePoint Taxonomy Part Three – Administrator Experience
(including Using SharePoint Term Stores and SharePoint Taxonomy Hierarchy)
SharePoint Taxonomy Part Four – Developer Experience
(including SharePoint 2010 Visual Web Parts, SharePoint 2010 Taxonomy Web Part Development Screencast, SharePoint 2010 Taxonomy Reference Issues, and SharePoint Game of Life web part on CodePlex.)

Back in 2005, I’m pretty sure I was one of the SharePoint Program Managers most disappointed when we learned that taxonomy wasn’t going to make the cut for Microsoft Office SharePoint Server 2007 (MOSS). I didn’t work on the specs, but I believed quite strongly that it needed to be built. It wasn’t a huge surprise that it got cut though. The Enterprise Content Management (ECM) team had plenty of work to do—mainly focused on providing Microsoft Content Management Server (MCMS) features in SharePoint. However, with the MOSS release out the door and MOSS for Internet sites popping up all over the Internet, it was time to tackle taxonomy. I know it was a Herculean task, so I tip my cap to my old friends on the SharePoint team who have made it a reality in SharePoint 2010.

What’s all the fuss about?

Managed Metadata definition from Microsoft TechNet: “Managed metadata is a hierarchical collection of centrally managed terms that you can define, and then use as attributes for items in Microsoft SharePoint Server 2010.”

Why is taxonomy arguably the most important new feature in SharePoint 2010? Hasn’t SharePoint always had metadata? There are many useful tagging features in the new release, but another important aspect is the potential unlocked by the new infrastructure. As a Microsoft PM, I heard a clear refrain from customers and partners. I’ll paraphrase it in this way: “Sure, we’d like you to do everything, but we know you can’t. Please just give us the foundation so we can do what’s necessary for our business needs.” [Pardon the SharePoint Foundation pun.] By adding the managed metadata service plumbing to SharePoint, customers and partners have the opportunity to use and extend the taxonomy system.

Before the 2010 improvements, organizations using SharePoint would have to either build their own taxonomy solution, or rely upon business rules for how their users should add metadata to SharePoint items—this meant that different users could be tagging items with slightly different terms and eliminating most of the value of taxonomy. There were also technical constraints that prevented creation of a custom solution. For example, the logical place to store terms would be in a list, but lists couldn’t be shared across the site collection boundary, so the same content would have to be duplicated if the terms were to be shared. There was also no concept of hierarchical metadata and there was no way to share a collection of terms with delegated permissions. But if all that isn’t enough for you, MVP Chris O’Brien wrote a whole post about why SharePoint 2010 taxonomy is important and here’s a list of Benefits of using managed metadata from TechNet:

More consistent use of terminology
Managed metadata facilitates more consistent use of terms, as well as more consistent use of the managed keywords that are added to SharePoint Server items. You can pre-define terms, and allow only authorized users to add new terms. You can also prohibit users from adding their own managed keywords to items, and require them to use existing ones. Managed metadata also provides greater accuracy by presenting only a list of correct terms from which users can select values. Because managed keywords are also a type of managed metadata, even the managed keywords that users apply to items can be more consistent.

Because metadata is used more consistently, you can have a higher degree of confidence that it is correct. When you use metadata to automate business processes—for example, placing documents in different files in the record center based on the value of their department attribute—you can be confident that the metadata was created by authorized users, and that the value of the department attribute is always one of the valid values.

Better search results
A simple search can provide more relevant results if items have consistent attributes.

As users apply managed terms and keywords to items, they are guided to terms that have already been used. In some cases, users might not even be able to enter a new value. Because users are focused on a specific set of terms, those terms—and not synonyms—are more likely to be applied to items. Searching for a managed term or a managed keyword is therefore likely to retrieve more relevant results.

Dynamic
In previous versions of SharePoint Server, to restrict the value of an attribute to being one of a set of values, you would have created a column whose type is "choice", and then provided a list of valid values. When you needed to add a new value to set of choices, you would have to modify every column that used the same set of values.

By using managed metadata in SharePoint Server 2010, you can separate the set of valid values from the columns whose value must be one of the set of valid values. When you need to add a new value, you add a term to the term set, and all columns that map to that term set would use the updated set of choices.

Using terms can help you keep SharePoint Server items in sync with the business as the business changes. For example, assume your company's new product had a code name early in its development, and was given an official name shortly before the product launched. You included a term for the code name in the "product" term set, and users have been identifying all documents related to the product by using the term. When the product name changed, you could edit the term and change its name to the product's official name. The term is still applied to the same items, but its name is now updated.”

image - Adding keywords to a document in SharePoint 2010 Beta 2

What’s in this release?

To borrow a joke from Dan Kogan’s SharePoint Conference 2009 (#SPC09) talk, when we talk about terms, we need to first define our terms. From MSDN’s definitions of SharePoint Managed Metadata:

term
A word or phrase that can be associated with an item in SharePoint Server 2010.
term set
A collection of related terms.
managed term
A term that can be created by users only with the appropriate permissions and often organized into a hierarchy. Managed terms are usually predefined.
managed keyword
A word or phrase that has been added to SharePoint Server 2010 items. All managed keywords are part of a single, non-hierarchical term set called the keyword set.
term store
A database that stores both managed terms and managed keywords.

Also, one definition that isn’t in the list:
group
In the term store, all term sets are created within groups. In other words, group is the parent container for term sets.

image - The Term Store Management Tool

These are the elements within the SharePoint 2010 managed metadata functionality, but the pieces aren’t the only consideration—it’s also important how they fit together. According to information management experts Earley & Associates, there are three different types of relationships in taxonomies:

Equivalent (Synonyms: "LOL = Laughing out loud")
Hierarchical (Parent/Child : "Sports Equipment => ball")
Associative (Concept/Concept: "Bouncy things - ball")

SharePoint 2010 will provide SharePoint users, administrators and developers with the UI and API required for the first two. This means that faculties such as centrally managed terms, folksonomy and tag clouds (social tagging) are enabled. The third type—that SharePoint 2010 will not be offering—is ontologies. Here’s a quick discussion of each type.

Equivalent Terms

SharePoint taxonomy will allow synonyms and preferred terms. Synonyms allow a central understanding that LOL is the same as “laughing out loud,” and preferred terms specify which of the two should be used.

The other side of the equivalence coin is dealing with words with more than one meaning. To help disambiguate terms, SharePoint term descriptions show in a tooltip so that users can differentiate between G-Force (the recent movie featuring a specially trained squad of guinea pigs) vs. G-Force (my favourite childhood cartoon) from Battle of the Planets.

Hierarchical Terms

A central repository of terms enables consistency across users. Providing a hierarchy allows for information architecture and organization. In the SharePoint Term Store Management Tool, users with sufficient permissions will be able to perform many operations on terms in the hierarchy. These include: copying, reusing, moving, duplicating (for polyhierarchy), deprecating, and merging. The hierarchy is broken down into a term store at the top, then a group, term sets, and finally, managed terms.

image - Example of a taxonomy hierarchy (image courtesy Microsoft)

Note: managed keywords (or just keywords) will be stored in a separate single database. Keywords will be used for social tagging such as tag clouds and folksonomy, but keywords can be promoted to managed terms.

Associative Terms

An ontology is a means of classifying data based on an associative relationship. There are endless possibilities for these types of relationships. For example, I could have a hierarchy of terms in SharePoint 2010 that includes the terms “ball” and “bat” as children of the term “sports equipment.” An ontology would allow me to also create a relationship between “Bouncy things” and “ball” because they are conceptually related. Why didn’t the SharePoint team add ontologies? That’s a reasonable question, but the fact is that it simply may not have been worth the effort to tackle such a specialized function when they were already trying to build an ambitious feature. Also, many people wonder if anyone but a library scientist or a taxonomist will complain.

How will SharePoint taxonomy be used?

Obviously, the most popular end-user use of EMM will be taxonomy to fulfill business needs and social tagging. Many content types will ship with a Managed Metadata data-type column and users will be able to tag their list items, documents, etc, with shared terms. This end-user associated metadata will then be used to classify, organize, find and share information within SharePoint. By tagging external pages, users have a way to add links to their favourite browser’s bookmarks.

However, another aspect of the new managed metadata functionality is how it could be used for enhanced navigation and search. For example, terms can be used to enable more advanced parametric search features, targeted search and possibly even lemmatisation in FAST search—but I’m not a search expert, so I’d have to do some more research to find out what’s happening on the search side. One thing is for sure, customers and partners will find interesting ways to use the taxonomy framework.

In terms of navigation, the ability to alter the way you navigate your data based on tags is also referred to as faceted navigation. When I was working on SharePoint navigation, we nicknamed faceted navigation, “navigation goggles.” The idea being that you could choose different types of navigation the same way you can shift between song view, albums or artists on many MP3 players.

For developers, SharePoint 2010 EMM also includes the Taxonomy APIs. Most of the EMM classes are found in the Microsoft.SharePoint.Taxonomy namespace.

• TaxonomySession class
• TermStore class
• Group class
• TermSet class
• Term class
• CommitAll method
• IsAvailable property
• Name property
• CreateLabel method
• SetDescription method

This block of sample code (courtesy of Microsoft) shows how the taxonomy API can be used.

using (SPSite site = new SPSite(http://localhost/))
{
//Instantiates a new TaxonomySession for the current site.
TaxonomySession session = new TaxonomySession(site);

//Instantiates the connection named "Managed Metadata Service
//Connection" for the current session.
TermStore termStore = session.TermStores["Managed Metadata Service Connection"];

// Creates and commits a Group object named Group1, a TermSet object
// named termSet1, and several Term objects. Term1, Term2, and Term3 are
// members of termSet1. Term1a and Term1b are children of Term1.
Group group1 = termStore.CreateGroup("Group1");
TermSet termSet1 = group1.CreateTermSet("TermSet1");
Term term1 = termSet1.CreateTerm("Term1", 1033);
Term term2 = termSet1.CreateTerm("Term2", 1033);
Term term3 = termSet1.CreateTerm("Term3", 1033);
Term term1a = term1.CreateTerm("Term1a", 1033);
Term term1b = term1.CreateTerm("Term1b", 1033);
termStore.CommitAll();

// Sets a description and some alternate labels for term1 and commits
// the changes to termStore.
term1.SetDescription("This is term1", 1033);
term1.CreateLabel("TermOne", 1033, false);
term1.CreateLabel("FirstTerm", 1033, false);
termStore.CommitAll();

// Deletes an unnecessary term, term3, from termStore and commits changes
term3.Delete();
termStore.CommitAll();

}

Multilingual Taxonomy In SharePoint 2010

In the Enterprise Metadata Management documentation it states that Managed Terms could be used when metadata "Can be applied in one language, but might be viewed in other languages"

This is available in term stores because Managed terms can be assigned multiple labels. When someone types in any of the labels (which could be in different languages), they will be applying the same term. This creates a multilingual term system.

Here is the documentation page about Multilingual term sets (SharePoint Server 2010)

Note: Labels are different than descriptions. There can only be one description on a term and it's generally used for disambiguation. (e.g., "this is Dallas the city, not Dallas the TV show")

Conclusion

The new SharePoint 2010 Managed Metadata functionality is exciting and provides a framework to build more taxonomy features. Through managed metadata, SharePoint users gain access to functionality such as folksonomy, social tagging (tag clouds) and more powerful search options. EMM also provides a way to centrally manage bookmarks.

The Term Store Management Tool available in Central Administration (and Site Settings) enables administrators to manage a central vocabulary of terms for the whole farm. Operations that administrators can perform on the term hierarchy include copying, reusing, moving, duplicating, deprecating, and merging. Furthermore, having a managed repository enforces consistency across users.

Enterprise Metadata Management is a huge topic. In fact, how SharePoint 2010 exposes the taxonomy features (e.g., the new tag cloud web part) is worthy of its own post, so I’m not going to try and sum it all up in one.

These are the upcoming posts:

SharePoint Taxonomy Part One – Introduction to SharePoint Managed Metadata
SharePoint Taxonomy Part Two – End-User Experience
SharePoint Taxonomy Part Three – Administrator Experience
(including Using SharePoint Term Stores and SharePoint Taxonomy Hierarchy)
SharePoint Taxonomy Part Four – Developer Experience
(including SharePoint 2010 Visual Web Parts, SharePoint 2010 Taxonomy Web Part Development Screencast and SharePoint 2010 Taxonomy Reference Issues)

Managed Metadata Best Practices from Microsoft:

Plan managed metadata (SharePoint Server 2010)
Managed metadata overview (SharePoint Server 2010)
Managed metadata service application overview (SharePoint Server 2010)
Managed metadata roles (SharePoint Server 2010)
Plan terms and term sets (SharePoint Server 2010)
Plan to import managed metadata (SharePoint Server 2010)
Plan to share terminology and content types (SharePoint Server 2010)
SharePoint Enterprise Content Management

[Disclaimer: This information is based on SharePoint 2010 Beta 2 and may differ from the RTM build.]

9 comments:

George knox said...

Really good post which explains the relevance of Metadata in SP2010.
However, i am a bit confused on where the likes of the Schemalogic www.schemalogic.com Metapoint Server and Desktop component fits into the MS strategy. My understanding was that they were part of the TAP program and SP2010 rollout partner.
Do you see them as complimentary or redundant now that SP2010 has the Term store functionality. I can see where they fit in with 2007, but unclear with MS strategy going forward ?

Thanks

harry said...

Hi george,

This artical is very useful for me. I am a Share Point developer and always looking to learn something new. I would like to introduce another good SharePoint blog, Have a look.

http://SharePointBank.com
Harry

SharePointFrank said...

Good intro. Knowledge Management and Social Networking seems to be unleashed, for real this time, with the upcoming new version of Microsoft SharePoint 2010.

Now you can make a “semantic jump start” with pre-defined taxonomy metadata for SharePoint 2010. You can download complete enterprise taxonomies, ready to import into the Term Store here:

http://www.layer2.de/en/products/Pages/SharePoint-2010-Taxonomy-Metadata.aspx

Michael Greth said...

Great post
have you ever tried to import multilingual metadata sets from a csv file? I didn't succeed

Stephen Cawood said...

Michael, I haven't tried an import like that yet. I have, however, been meaning to write a post about the CSV import option, so maybe I can cover it.

Stephen Cawood said...

From one of the Microsoft EMM devs:

"The simple csv import only allows for default labels to be imported. The API allows for synonyms (non-default labels), translations and custom properties."

ableiserson said...

We're just starting to use managed metadata and your article was most helpful to me.

Also, speaking as one of the last in that dying breed -- degreed library scientists (AMLS even) -- my hat's off to MS doing taxonomies and synonyms and not tackling ontologies.

ableiserson said...

We're just starting to use managed metadata and your article was most helpful to me.

Also, speaking as one of the last in that dying breed -- degreed library scientists (AMLS even) -- my hat's off to MS doing taxonomies and synonyms and not tackling ontologies.

Alexandr said...

Hello,

I would like to learn more about automatic metadata creation tools, such as Pingar Metadata Extractor. What are the pros and cons for using it?

Thanks