Microsoft is now making an interesting move as they announce to replace the authentication system for SharePoint Server. Microsoft plan to make the collaboration platform one of the first of the company’s marquee applications to rely on a new claims-based identity model.
The goal is to have SharePoint incorporate an authentication model that works with any corporate identity system, including Active Directory, LDAPv3-based directories, application-specific databases and new user-centric identity models, such as LiveID, OpenID and InfoCard systems, including Microsoft’s CardSpace and Novell’s Digital Me.
SharePoint will lose its rigid authentication system and replace it with an claims-based authentifiation solution. Claims could for instance be age or group membership, that are passed to obtain access to the SharePoint environment and to systems integrated with that environment. Claims are a set of statements that identify a user and provide specific information.The claims are used by systems to make such decisions as who gets access, who can retrieve content or who can complete transactions.
“We don’t want to come up with another, or the next, authentication system for SharePoint,” says Venkey Veeraraghavan, senior program manager lead for Office SharePoint Server.
Veeraraghavan said Microsoft settled on a claims-based system because it is flexible and designed for heterogeneous identity environments. “It allowed us to invest in one place [SharePoint] and know that we can credibly say we work with multiple systems, especially as they are woven into what we’re calling a Metasystem. We want to continue to work on making SharePoint useful to our customers, not spend a lot of time integrating with each and every identity system one-by-one, or worse, not do it because of resource concerns.”
In its current release SharePoint is fairly limited with authentication mechanisms. You can use NTLM (ancient and inefficient), Basic (used with SSL and the clear-text passwords are SO not good) or Kerberos (complex to configure, but better performance) or use MS Single Sign-On. This new move sounds like a great way to open up their collaboration platform to third party options, which are what most companies use.
Kim Cameron, Microsoft’s identity architect, believes an industry transformation to claims-based identity is 18 to 24 months away, which would, considering the normal product release cycle of the Office platform, place the implementation of new claims-based identity model in the next major release of Sharepoint Server 2009 or later.
Read full article
© Copyright 2007, Tomas Elfving
Wednesday, October 17, 2007
Saturday, October 13, 2007
Security and identity in Mashup applications
The most interesting mashup apps will be creations powered with our personal and business information. So far, there haven't been a single-sign-on model supporting the mashup programming model. Most mashups we've seen so far don't require (or support) logins that allow it to collect information from your private repositories of information. While initiatives like OpenID have the potential to resolve some of these issues, there is a lot of work to be done before the average user will trust a mashup with access to their private information. In coming posts, I'll dive into the Open ID initiative to find out its potential to fill this important technology gap that fundamentally limits the real value that mashups have the potential to provide.
© Copyright 2007, Tomas Elfving
© Copyright 2007, Tomas Elfving
Wednesday, October 10, 2007
Problem seeing updates in AD groups in MOSS solved
There have been comments regarding a problem of synchronizing AD and Sharepoint groups ("Audience Targeting and User Profiles in MOSS 2007") . Tim O finally figured the solution and here's his description:
"If experiencing problems seeing updates to the AD groups, it's possible the timer service has stopped and failed to restart. A few people at MS are aware of this problem. In Central Administration for SharePoint, select 'Operations', then 'Timer Job Status' to view the most recent timer job activity. Several jobs are visible here and some run infrequently.
On your SP server, in the services section, restart the following service:
Windows Sharepoint Services timer."
Thanks Tim O for sharing!
© Copyright 2007, Tomas Elfving
"If experiencing problems seeing updates to the AD groups, it's possible the timer service has stopped and failed to restart. A few people at MS are aware of this problem. In Central Administration for SharePoint, select 'Operations', then 'Timer Job Status' to view the most recent timer job activity. Several jobs are visible here and some run infrequently.
On your SP server, in the services section, restart the following service:
Windows Sharepoint Services timer."
Thanks Tim O for sharing!
© Copyright 2007, Tomas Elfving
Etiketter:
AD groups,
Audience targeting,
MOSS,
Sharepoint
Tuesday, October 9, 2007
Components of an Enterprise Search system
In a previous posting, I described a custom-made enterprise search solution. This time I look into the characteristics of generic search systems that are available on the market. In the decision between developing a custom solution or buy a commercial system, an important aspect will be the amount of work you need to put in to make the commercial product actually do what you want. This article aims to give You a better understanding of what factors that affect the feasibility of these products.
The components and the behaviour of an Enterprise Search System
Search systems act in two directions: toward the content and toward the searcher. Search engines indexes content to enable accurately sorted result sets and process queries, to put it simply. To accomplish these tasks, most search systems consists of three major quite interdependent components:
1. Content acquisition
2. Indexing service
3. Query processing (including parsing, matching and post-processing)
If something goes wrong in any one of these subsystems, the problem can significantly downgrade the performance and effectiveness of the search engine as a whole. If the content acquisition function is not able to identify and incorporate new content in the index, then the index will obviously soon become outdated. If heavy work load overloads the query processing system, the resources needed to perform index updating to refresh the index may not be sufficient, or the response time for displaying result sets to users degrades to an unacceptable level.
Performance is important an issue in all parts of an search engine solutions. Most of the processes in a search system are processor- and disc-intensive. In order to improve the overall performance, additional hardware, memory, storage, or bandwidth may be needed. To speed up response time in the query processing module, one might need to limit the number of users, place a ceiling on number of results displayed per query, or eliminate certain resource hungry indexing processes such as automated metatagging.
Content Acquisition
You need a plan how to aquire content into Your search solution. The following strategies may be used separately, or mixed. There is no single strategy that always works for everyone:
1. The content acquisition subsystem gathers content to a common index database. Content must be identified, then copied from its location to a processing folder, and finally moved to a “to be indexed” folder. This is the approach most enterprise search systems use.
2. The second way is ton set up the servers with content to run a script and identify changed or new content, then copies that content to a new folder, maybe do some processes the content before moving the processed files to the indexing subsystem. This is an approach supported by such enterprise systems as FAST Search& Transfer, Verity, and others. This makes it possible to plan when updates of the index occurs to avoid performance-heavy indexing updates when a lot of users do searches.
3. Finally one can use so called “spiders”. A spider is a script that on a scheduled basis visits servers, folders, or files. When a change or a new document is identified, the script copies the file to the index processing subsystem. This is an approach supported by virtually all search systems today, but remains a surprisingly complicated exercise. Spiders can for instance experience problems with session variables in URLs, JavaScript, Flash, and forms. When improperly configured, a spider can chase its own tail through a series of infinitely recursive links.
These three content acquisition techniques can be used separately or together by the search system. When the enterprise search system offers application programming interfaces or toolkit modules, highly customized content acquisition systems can be developed. For example, mission-critical content data can be acquired on a near real time basis and incorporated into th search system using scripts whereas non-critical may be acquired on a longed interval. No single content acquisition approach is appropriate in most organizations, hybrid content acquisition techniques are the norm.
To improve content acquisition, a search system can integrate with a records management system, a document management system, or other enterprise information management system that holds information in a particular format or structure. This type of integration typically requires custom scripting and may require additional infrastructure modifications. Similarly, compliance with Basel II, Sarbanes-Oxley or other mandated guidelines will require customization of the content acquisition module as well as other parts of the search system to ensure that versions of documents are not made unfindable.
Indexing Service
The goal of storing content data in an index is to optimize the speed of finding relevant documents for a search query. No one really want a full “byte-by-byte” disk scan for every search query. It may take minutes or hours to complete for an index of say 10 000 documents (which is really nothing in most search solutions). With an index we talking a few milleseconds for the same operation!
To provide a set of matching items quickly, a search engine will typically collect information, or metadata, about the group of items under consideration beforehand. For example, a library search engine may determine the author of each book automatically and add the author name to a description of each book. Users can then search for books by the author's name. Other metadata in this example might include the book title, the number of pages in the book, the date it was published, and so forth.
Index Design Factors
The challenge is to find a optimal Index design and it boils down to considering the following index design factors:
Search engines provide an interface to a group of items that enables users to specify criteria about an item of interest and have the engine find the matching items within the group. Most commonly, items are documents or web pages and the criteria are words or concepts that the documents may contain.
There are several varieties of syntax in which a search engine user can express a query. Some methods are formalized and require a strict, logical and algebraic syntax. Other approaches are less strict and allow for a less defined query. One form of a less-restricted query syntax is referred to as Natural Language Search, which is a term typically used to describe web search engines that apply natural language processing of some form. For example, instead of searching for one or two words, a query could consist of an English sentence or paragraph. A natural language search engine will then parse the query into words and evaluate searches for these words. This places less burden on the search engine user to formulate a specific query using restrictive, and sometimes difficult to learn, syntax. A second definition of natural language search engines reflects how the search engine performs indexing, unrelated to the query syntax. This requires a semantic understanding of the query in order to disambiguate the text.
Ranking search result sets
A Boolean search for an item within a group of items will either return the exact matching item or nothing. This is a rather orthodox search method where the equality between the desired item and the actual item must be exact. In application, it is sometimes far more beneficial and useful to incorporate a more lax measure of similarity between the desired item(s) and the items that exist in the group being searched. For example, instead of finding only the exact book in a library, a library search engine may return a list of 'similar' books, with the exact book listed first.
The list of items that meet the criteria specified by the query are typically sorted, or ranked, in some regard so as to place the most 'relevant' items first. Placing the most relevant items first reduces the time required by users to determine whether one or more of the resulting items are sufficiently similar to the query. It has become common knowledge through the use of Web search engines that the further down the list of matching items you browse, the less relevant the items become.
© Copyright 2007, Tomas Elfving
The components and the behaviour of an Enterprise Search System
Search systems act in two directions: toward the content and toward the searcher. Search engines indexes content to enable accurately sorted result sets and process queries, to put it simply. To accomplish these tasks, most search systems consists of three major quite interdependent components:
1. Content acquisition
2. Indexing service
3. Query processing (including parsing, matching and post-processing)
If something goes wrong in any one of these subsystems, the problem can significantly downgrade the performance and effectiveness of the search engine as a whole. If the content acquisition function is not able to identify and incorporate new content in the index, then the index will obviously soon become outdated. If heavy work load overloads the query processing system, the resources needed to perform index updating to refresh the index may not be sufficient, or the response time for displaying result sets to users degrades to an unacceptable level.
Performance is important an issue in all parts of an search engine solutions. Most of the processes in a search system are processor- and disc-intensive. In order to improve the overall performance, additional hardware, memory, storage, or bandwidth may be needed. To speed up response time in the query processing module, one might need to limit the number of users, place a ceiling on number of results displayed per query, or eliminate certain resource hungry indexing processes such as automated metatagging.
Content Acquisition
You need a plan how to aquire content into Your search solution. The following strategies may be used separately, or mixed. There is no single strategy that always works for everyone:
1. The content acquisition subsystem gathers content to a common index database. Content must be identified, then copied from its location to a processing folder, and finally moved to a “to be indexed” folder. This is the approach most enterprise search systems use.
2. The second way is ton set up the servers with content to run a script and identify changed or new content, then copies that content to a new folder, maybe do some processes the content before moving the processed files to the indexing subsystem. This is an approach supported by such enterprise systems as FAST Search& Transfer, Verity, and others. This makes it possible to plan when updates of the index occurs to avoid performance-heavy indexing updates when a lot of users do searches.
3. Finally one can use so called “spiders”. A spider is a script that on a scheduled basis visits servers, folders, or files. When a change or a new document is identified, the script copies the file to the index processing subsystem. This is an approach supported by virtually all search systems today, but remains a surprisingly complicated exercise. Spiders can for instance experience problems with session variables in URLs, JavaScript, Flash, and forms. When improperly configured, a spider can chase its own tail through a series of infinitely recursive links.
These three content acquisition techniques can be used separately or together by the search system. When the enterprise search system offers application programming interfaces or toolkit modules, highly customized content acquisition systems can be developed. For example, mission-critical content data can be acquired on a near real time basis and incorporated into th search system using scripts whereas non-critical may be acquired on a longed interval. No single content acquisition approach is appropriate in most organizations, hybrid content acquisition techniques are the norm.
To improve content acquisition, a search system can integrate with a records management system, a document management system, or other enterprise information management system that holds information in a particular format or structure. This type of integration typically requires custom scripting and may require additional infrastructure modifications. Similarly, compliance with Basel II, Sarbanes-Oxley or other mandated guidelines will require customization of the content acquisition module as well as other parts of the search system to ensure that versions of documents are not made unfindable.
Indexing Service
The goal of storing content data in an index is to optimize the speed of finding relevant documents for a search query. No one really want a full “byte-by-byte” disk scan for every search query. It may take minutes or hours to complete for an index of say 10 000 documents (which is really nothing in most search solutions). With an index we talking a few milleseconds for the same operation!
To provide a set of matching items quickly, a search engine will typically collect information, or metadata, about the group of items under consideration beforehand. For example, a library search engine may determine the author of each book automatically and add the author name to a description of each book. Users can then search for books by the author's name. Other metadata in this example might include the book title, the number of pages in the book, the date it was published, and so forth.
Index Design Factors
The challenge is to find a optimal Index design and it boils down to considering the following index design factors:
- Merge factors – It is about how data enters the index, or how words or subject features are added to the index. Do You have multiple indexes that needs to be merged? Can it be done asynchronously? Search engine index merging is similar in concept to the SQL Merge command and other merge algorithms.
- Storage techniques - How to store the index data - should the information be data compressed or filtered?
- Index size - How much computer storage is required to support the index
- Lookup speed – What are Your lookup speed performance requirements? How quickly an entry in a data structure can be found, versus how quickly it can be updated or removed?
- Maintenance – How to work with maintenance of the index over time
- Fault tolerance - How important it is for the service to be reliable, how to deal with index corruption, whether bad data can be treated in isolation, dealing with bad hardware, partition (database)|partitioning schemes.
Search engines provide an interface to a group of items that enables users to specify criteria about an item of interest and have the engine find the matching items within the group. Most commonly, items are documents or web pages and the criteria are words or concepts that the documents may contain.
There are several varieties of syntax in which a search engine user can express a query. Some methods are formalized and require a strict, logical and algebraic syntax. Other approaches are less strict and allow for a less defined query. One form of a less-restricted query syntax is referred to as Natural Language Search, which is a term typically used to describe web search engines that apply natural language processing of some form. For example, instead of searching for one or two words, a query could consist of an English sentence or paragraph. A natural language search engine will then parse the query into words and evaluate searches for these words. This places less burden on the search engine user to formulate a specific query using restrictive, and sometimes difficult to learn, syntax. A second definition of natural language search engines reflects how the search engine performs indexing, unrelated to the query syntax. This requires a semantic understanding of the query in order to disambiguate the text.
Ranking search result sets
A Boolean search for an item within a group of items will either return the exact matching item or nothing. This is a rather orthodox search method where the equality between the desired item and the actual item must be exact. In application, it is sometimes far more beneficial and useful to incorporate a more lax measure of similarity between the desired item(s) and the items that exist in the group being searched. For example, instead of finding only the exact book in a library, a library search engine may return a list of 'similar' books, with the exact book listed first.
The list of items that meet the criteria specified by the query are typically sorted, or ranked, in some regard so as to place the most 'relevant' items first. Placing the most relevant items first reduces the time required by users to determine whether one or more of the resulting items are sufficiently similar to the query. It has become common knowledge through the use of Web search engines that the further down the list of matching items you browse, the less relevant the items become.
© Copyright 2007, Tomas Elfving
Etiketter:
Content aquisition,
Enterprise Search,
Index database,
Indexing,
Query processing,
Spiders
Subscribe to:
Posts (Atom)