Discovering Magic Most of us create identities across the web without much conscious thought. We fill in profiles, upload photos, videos, reviews, and bookmarks. Although this information is often public, it’s fragmented into the silos of individual websites. Wouldn’t it be a little magical if, when you signed up for a new site, the site said something like, “We notice you have a profile photo on Flickr and Twitter, would you like to use one of those or upload a new one?” Try it out! Our footprints across the web#section1 Social media sites encourage us to have more open and transparent conversations, and create opportunities for new people to participate in our lives. Most of these identities are tied into social media sites where we create content. The semantic web and open format data#section2 The semantic web attempts to make information that is currently only intelligible to humans machine readable. Beyond microformats, other open data formats such RDF, RSS, and ATOM contain rich data we can use.
Internal Site Search Analysis: Simple, Effective, Life Altering! Understanding of your site visitors’ intent is one of the most delightful parts of web data analysis. In this article, we’ll learn five ways to analyze your internal site-search data—data that’s easy to get, to understand, and to act on. But let’s take a step back. Why should you care about this in the first place? Good question. In the good old days, people dutifully used site navigation at the left, right, or top of a website. Now when people show up at a website, many of them ignore our lovingly crafted navigational elements and jump to the site search box. There’s also one more (really important) reason, just in case you need a bit more convincing. All the search and clickstream data you have (from Google Analytics, Omniture, WebTrends, etc.) is missing one key ingredient: Customer intent. Your internal site-search data contains that missing ingredient: intent. Internal site-search data is easy to access and analyze, no matter which web analytics tool you use. Basics first. Fig. 1.
Design Patterns: Faceted Navigation We are pleased to present an excerpt from Chapter 4 of Search Patterns by Peter Morville and Jeffery Callender (O’Reilly, 2010). —Ed. Faceted Navigation#section1 Also called guided navigation and faceted search, the faceted navigation model leverages metadata fields and values to provide users with visible options for clarifying and refining queries. Faceted navigation is arguably the most significant search innovation of the past decade.[2] It features an integrated, incremental search and browse experience that lets users begin with a classic keyword search and then scan a list of results. Figure 4-19 illustrates a successful implementation of faceted navigation as a model for interacting with the catalogs of several academic libraries. [2] Marti Hearst and her Flamenco project collaborators at UC Berkeley deserve credit for their pioneering research in faceted navigation ( Fig. 4.19 Faceted navigation at the Triangle Research Libraries
Findability and Exploration: the future of search The majority of people visiting a news website don’t care about the front page. They might have reached your site from Google while searching for a very specific topic. They might just be wandering around. Or they’re visiting your site because they’re interested in one specific event that you cover. This is big. We need ambient findability. Pete Bell recently opined that search is the enemy of information architecture. First, we need to understand a bit more about search. Full-text search is a last resort Rack your brain for a minute. When somebody enters the query Tony Blair, they’re not looking for news articles that contain the words Tony Blair, they’re looking for news articles and assorted other information relating to Tony Blair. Let’s make a small but important distinction. A good search engine goes beyond occurrence, by stemming and by being aware of synonyms. Google is amazing at this sort of thing. Some of these wishes are a bit too wild for current technology. Best bets
k-nearest neighbor algorithm In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. Algorithm[edit]
Nearest neighbor search Nearest neighbor search (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest (or most similar) points. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q. Donald Knuth in vol. 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points. Most commonly M is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Applications[edit] Methods[edit] Various solutions to the NNS problem have been proposed. .
Recommender system Recommender systems or recommendation systems (sometimes replacing "system" with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that user would give to an item.[1][2] Recommender systems have become extremely common in recent years, and are applied in a variety of applications. The most popular ones are probably movies, music, news, books, research articles, search queries, social tags, and products in general. Overview[edit] The differences between collaborative and content-based filtering can be demonstrated by comparing two popular music recommender systems - Last.fm and Pandora Radio. Pandora uses the properties of a song or artist (a subset of the 400 attributes provided by the Music Genome Project) in order to seed a "station" that plays music with similar properties. Each type of system has its own strengths and weaknesses. Approaches[edit] Collaborative filtering[edit] [citation needed]
Testing Search for Relevancy and Precision Despite the fact that site search often receives the most traffic, it’s also the place where the user experience designer bears the least influence. Few tools exist to appraise the quality of the search experience, much less strategize ways to improve it. When it comes to site search, user experience designers are often sidelined like the single person at an old flame’s wedding: Everything seems to be moving along without you, and if you slipped out halfway through, chances are no one would notice. But relevancy testing and precision testing offer hope. These are two tools you can use to analyze and improve the search user experience. You’ve already got everything you need#section1 The search engine itself provides the critical resource you need to run these tests—the report of the most commonly submitted queries. Fig. 1: Report of the most commonly submitted queries. Fig. 2: Diagram of a Zipf Curve showing unique search phrases. Relevancy testing#section2 Precision testing#section7
The Ultra Gleeper: a Recommendation Engine for Web Pages by Leonard Richardson (leonardr at segfault dot org) Paper revision 1: 02/06/2005 Introduction Recommendation engines enjoyed a vogue in the mid-90s. They would solve the problem of information overload by matching user preferences against a large universe of data. The ultimate realization of this strategy would be a recommendation engine capable of mining that Northwest territory of data, the World Wide Web. Recommendation engines were built and run into troubles. But over the years, as people built these web sites, they came up with models and tools for solving the basic problem of finding and tracking useful web sites. The shoulders of giants A web page recommendation engine is now possible because a lot of the work is done elsewhere on the web and exposed to the public, because web surfers track new types of information, and because new ideas have taken root. Giving away the store In this section I have collected all the little epiphanies I had that led to the Ultra Gleeper.