What percentage of Web pages contain broken links? Do you think you know the answers to these questions? You might just be surprised. This ATW feature reports the results of the third State of the Web Survey (SOWS), an automated Web survey that tracks page size and linkrot on the WWW. Why Page Size / Linkrot Is ImportantIn the Seventh Georgia Tech GVU WWW survey, conducted in April of 1997, the top two reported problems in using the Web are speed (the perennial biggie) and broken links. The design implications are clear: readers want small, fast-loading pages, free of dead links. The GVU results raise but don't answer the more specific questions: "Just how big is too big?" "Just how many broken links is too many?" In other words, it's important to query the "State of the Web," in an effort to quantify specific thresholds for current user dissatisfaction. The SOW SurveyOn five different occasions in May of 1999, ATW's own automated harvester fetched pages at random from the WWW, in an attempt to describe the current state of the Web. Statistics gathered included total page size (text and graphics) and both incidence and prevalence of broken hyperlinks. (Incidence refers to the number of dead links as a percentage of the total number of links in the sample, and prevalence refers to the number of pages containing at least one dead link.) A total of 200 pages were selected at random from the AltaVista search engine for inclusion into the Third SOW Survey. (Arcane details regarding sample selection are available for methodologically inclined readers.) ResultsWhat's Changed / What Hasn'tThe bad news: page size hasn't changed, and there are still way too many dead links. The worse news: The "middle range" of moderately bloated pages seems to be on the increase. Linkrot is definitely on the rise, affecting substantially more pages than a year ago. The NumbersAs summarized in the table below, page size results from the third State of the Web Survey were remarkably similar to the results of the second Survey (conducted in May of 1998).
Average total size (text and graphics) of the pages surveyed was 60 KB. The incidence of linkrot in this third Survey was still nearly 6%, essentially unchanged from the second Survey.Linkrot now affects a whopping 28.5% of all pages sampled, up from 23% a year ago. (This increase almost certainly translates into greater user frustration, since it is likely the prevalence, rather than the incidence, of dead links that is the source of user dissatisfaction.)In prior Surveys, AltaVista actually performed worse than the Web as a whole, returning 10-11% dead links. But AV's performance is markedly better this year -- a more Web-typical 7% of the URLs returned by AltaVista pointed to either a nonexistent page or a nonexistent server. Digging DeeperAlthough average page size remains essentially unchanged from last year, there is a noticeable upward "creep" in the overall distribution of page sizes. As shown in the chart below, nearly half of the pages in SOWS-II were under a comfortable 30 Kbytes in size, but in SOWS-III, pages 30K and under now represent slightly less than a third of the pages sampled. (Statistically inclined readers will be interested to know that median page size has grown from 32K in SOWS-III to 44K in SOWS-III.) Page Size Linkrot results also continue to "cluster." While the vast majority of survey pages (71.5%) contained no dead links, the two worst "offenders" offered their visitors nothing but dead links. (Indeed, 8.7% of those pages marred by linkrot presented their hapless visitors with a better-than-even chance of "following" a dead link.) These results provide moderate support for the general conclusions from the first two surveys. There are still plenty of serious-minded, conscientious Web designers creating small, well-optimized pages, and the majority of site administrators are making a conscious effort to keep their pages free of dead links. Still...Pundits in the trade press have decreed 1999 to be the Year of Bountiful Bandwidth. (Of course, that's also what they said about 1998!) The upward "creep" in page size suggests that some myopic Web designers may have actually begun to believe such drivel. Typically, sites that offer suggested page size limits based on conventional wisdom in human factors research exhort authors to keep total page size under 30K. (The most obvious dissenting voice in this regard is ATW, which prefers a more modest 20K limit for its guidelines.) But the results of this third State of the Web Survey make it clear: nearly 30% of all pages present dead hyperlinks to their visitors, and the average page size remains, as it always has, at least twice the upper limit suggested by human factors research. Readers are still complaining . . . but it seems some folks have simply stopped listening Search and You May FindSearch is one of the most important user interface elements in any large website. As a rule of thumb, sites with more than about 200 pages should offer search. Guidelines for search include:
Our usability studies show that more than half of all users are search-dominant, about a fifth of the users are link-dominant, and the rest exhibit mixed behavior. The search-dominant users will usually go straight for the search button when they enter a website: they are not interested in looking around the site; they are task-focused and want to find specific information as fast as possible. In contrast, the link-dominant users prefer to follow the links around a site: even when they want to find specific information, they will initially try to get to it by following promising links from the home page. Only when they get hopelessly lost will link-dominant users admit defeat and use a search command. Mixed-behavior users switch between search and link-following, depending on what seems most promising to them at any given time but do not have an inherent preference. Despite the primacy of search, webdesign still needs to grounded in a strong sense of structure and navigation support: all pages must make it clear where they fit in the larger scheme of the site. First, there is obviously a need to support those users who don't like search or who belong to the mixed-behavior group. Second, users who get to a page through search still need structure to understand the nature of the page relative to the rest of the site. They also need navigation to move around the site in the neighborhood of the page they found by searching: it is rare that a single page holds all the answers or even that the search found the most relevant page, so users need to see related pages. Search should be easily available from every single page on the site. Search-dominant users will often click on a search button right on the home page, but other users may move around until they become lost. Once that happens, you don't want them to have to have to search for the search, so it should be right there on the page. This means any page, since you can't predict when users will give up navigating and look for the search button. Scoped SearchSometimes special areas of a site are sufficiently coherent and distinct from the rest of the site that it makes sense to offer a scoped search: restricted to search that subsite only (the search scope). In general, I warn against scoped search since our observations have shown that users often don't understand the structure of sites. It is quite common for users to believe that the answer is in a wrong subsite, meaning that they will never find it in a scoped search. Other times, users don't realize where they are and the scope of their search, so they may think that they are searching the entire site or a different subsite than the one they are actually in. In contemplating a scoped search option, designers should be strongly biased to avoid scoping. If the site in fact has subsites that necessitate scoped search, then all scoped search pages must do two things:
Boolean search should be avoided since all experience shows that users cannot use it correctly. We have studied many groups of users who have been given tasks like this:
Almost all users will enter the query Unfortunately, most users have not been taught debugging, so they are very poor at query reformulation. This is why we recommend minimal use of scoped search and no use of boolean search in the primary search interface. Advanced search is fine if offered on a different page than the simple search. The advanced search page can provide a variety of fancy options, including booleans, scopes, and various parametric searches (e.g., only find pages added or changed after a certain date). It is important to use an intimidating name like "advanced search" to scare off novice users from getting into the page and hurting themselves. Search is one of the few cases where we do recommend shaping the user's behavior by intimidation. Search systems can be made more usable by incorporating spelling checks (both for user queries and for document terms), by offering synonym expansion, by explicitly recognizing the concept of quality in addition to relevance, and by presenting results relative to the structure of the site. For example, if the site has a FAQ about a query term, then the FAQ page should be listed on the top of the results page even if other pages have higher relevance scores. Also, hits on a series of pages that belong to the same area of the site should be collapsed into a single reference to that subsite. On a final note, elementary schools should start teaching search skills. The future of user interfaces is almost certainly going to be dominated by various ways of searching immense information bases and gradually refining the retrieved set. Ideas like query reformulation, relevance feedback, and query-by-example are all important but do not come naturally to users. Search skills are likely to be more useful than most of the computer uses kids are currently taught. You can double the usability of your web site by following these guidelines: for two sample sites studied in Sun's Science Office, we improved measured usability by 159% and 124% by rewriting the content according to the guidelines. Writing for the Web is very different from writing for print:
|