GUIDELINES
FOR MULTIMEDIA ON THE WEB
Multimedia
is gaining popularity on the Web with several technologies to support use of animation,
video, and audio to supplement the traditional media of text and images. These new media
provide more design options but also require design discipline. Unconstrained use of
multimedia results in user interfaces that confuse users and make it harder for
them to understand the information. Not every webpage needs to bombard the user with the
equivalent of Times Square in impressions and movement.
Notes
about this month's column:
This column is longer than usual and much longer than recommended for a web page. I am
doing this on request because many people have asked for advice on how to design for the
new dynamic web media. Some of the links in this column point to Javatized pages and will
not show anything interesting if your browser does not support the version of Java used on
the pages.
Animation
Moving images have an overpowering effect on the human peripheral
vision. This is a survival instinct from the time when it was of supreme importance to be
aware of any saber-toothed tigers before they could sneak up on you. These days,
tiger-avoidance is less of an issue, but anything that moves in your peripheral vision
still dominates your awareness: it is very hard to, say, concentrate on reading text in
the middle of the a page if there is a spinning logo up in the corner. Never include a
permanently moving animation on a web page since it will make it very hard for your users
to concentrate on reading the text.
Animation
is good for:
- Showing
continuity in transitions.
When something has two or more states, then changes
between states will be much easier for users to understand if the transitions are animated
instead of being instantaneous. An animated transition allows the user to track the
mapping between different subparts through the perceptual system instead of having to
involve the cognitive system to deduce the mappings. A great example is the winner of the
first Java programming contest: proving the Pythagorean theorem by animating the movement
of various squares and triangles as they move around to demonstrate that two areas are the
same size (unfortunately, this otherwise good page uses animated text inappropriately: the
text moves constantly and is hard to relate to the events in the main animation).
- Indicating
dimensionality in transitions.
Sometimes opposite animated transitions can be
used to indicate movement back and forth along some navigational dimension. For example,
paging through a series of objects can be shown by an animated sweep from the right to the
left for turning the page forward (if using a language where readers start on the left).
Turning back to a previous page can then be shown by the opposite animation (sweeping from
the left to the right). If users move orthogonally to the sequence of pages then other
animated effects can be used to visualize the transition. For example following a
hypertext link to a footnote might be shown by a "down" animation and tunneling
through hyperspace to a different set of objects might be shown by an "iris
open" animation.
One example used in several user interfaces is the use of zooming to indicate that a new
object is "grown" from a previous one (e.g., a detailed view or property list
opened by clicking on an icon) or that an object is closed or minimized to a smaller
representation. Zooming out from the small object to the enlargement is a navigational
dimension and zooming in again as the enlargement is closed down is the opposite direction
along that dimension.
- Illustrating
change over time.
Since an animation is a time-varying display, it provides a
one-to-one mapping to phenomena that change over time.
- Multiplexing
the display.
Animation can be used to show multiple information objects in the
same space. A typical example is client-side imagemaps with explanations that pop up as
the user moves the cursor over the various hypertext anchors. It is also possible to
indicating the active areas by having them shimmer or by surrounding them with a marquee
of "marching ants". As always, objects should only move when appropriate (e.g.,
when the cursor is over the image).
- Enriching
graphical representations.
Some types of information are easier to visualize with
movement than with still pictures. Consider, for example, how to visualize the tool used
to remove pixels in a graphics application. The canonical icon is an eraser as shown on
the left in the following figure, but in user testing I have sometimes found that people
think that the icon is a tool for drawing three-dimensional boxes. Instead, one can use an
animated icon as shown on the right in the figure: when the icon animates, the eraser is
moved over the background and pixels are removed, clearly showing the functionality of the
tool.

In icon
design, it is always easier to illustrate objects (a box) than operations (removing
pixels), but animation provides the perfect support for illustrating any kind of change
operation. In an experiment reported at the CHI'91 conference, Baecker, Small, and Mander
increased the comprehension of a set of icons from 62% to 100% by animating them. Of
course, an icon should only animate when the user indicates a special interest in it (for
example, by placing the mouse cursor over it or by looking at it for more than a second if
eye-tracking is available). Especially considering the preponderance of toolbars in
current applications it would be highly distracting if all icons were to animate at all
times.
- Visualizing
three-dimensional structures.
Since the computer screen is two-dimensional, users
can never get a full understanding of a three-dimensional structure by a single
illustration, no matter how well designed. Animation can be used to emphasize the
three-dimensional nature of objects and make it easier for users to visualize their
spatial structure. The animation need not necessarily spin the object in a full circle:
just slowly turning it back and forth a little will often be sufficient. The movement
should be slow to allow the user to focus on the structure of the object.
Three-dimensional objects may be moved under user control, but often it is better if the
designer determines in advance how to best animate a movement that provides optimal
understanding of the object: this pre-determined animation can then be activated by the
user by simply placing the cursor over the object, whereas user-controlled movements
require the user to understand how to manipulate the object (which is inherently difficult
with a two-dimensional control device like the mouse used with most computers - to be
honest, 3D is never going to make it big time in user interfaces until we get a true 3D
control device).
- Attracting
attention.
Finally, there are a few cases where the ability of animation to
dominate the user's visual awareness can be turned to an advantage in the interface. If
the goal is to draw the user's attention to a single element out of several or to alert
the user to updated information then an animated headline will do the trick. Animated text
should be drawn by a one-time animation (e.g., text sliding in from the right, growing
from the first character, or smoothly becoming larger) and never by a continuous animation
since moving text is much harder to read than static text. The user should be drawn to the
new text by the initial animation and then left in peace to read the text without further
distraction.
Video
Due to bandwidth constraints, use of video should currently be
minimized on the web. Eventually, video will be used more widely, but for the next few
years most videos will be short and will use very small viewing areas. Under these
constraints, video has to serve as a supplement to text and images more often than it will
provide the main content of a website.
Currently,
video is good for:
- Promoting
television shows, films, or other non-computer media that traditionally have used trailers
in their advertising.
- Giving
users an impression of a speaker's personality. Unfortunately, most corporate executives
project a lot less personality than, say, Captain Janeway from Star Trek, so it is not
necessarily a good idea to show a talking head unless the video clip truly adds to the
user's experience.
- Showing
things that move. For example a clip from a ballet. Product demos of physical products
(e.g., a coin counter) are also well suited for video, whereas software demos are often
better presented as a series of full-sized screendumps where the potential customer can
study the features at length.
A major
problem with most videos on the web right now is that their production values are much too
low. User studies of CD-ROM productions have found that users expect broadcast-quality
production values and that users get very impatient with low-quality video.
A
special consideration for video (and spoken audio) is that any narration may lead to
difficulty for international users as well as for users with a hearing disability. People
may be able to understand written text in a foreign language because they have time to
read it at their own speed and because they can look up any unknown words in a dictionary.
Spoken words are sometimes harder to understand, especially if the speaker is sloppy, has
a dialect, speaks over a distracting soundtrack, or simply speaks very fast. Poor audio
quality may contribute to the difficulty of understanding spoken text: it is recommended
to use professional quality audio equipment and/or lavaliere microphones when recording a
narrator. The classic solution to these problems is to use subtitles but as shown in the
following figure, subtitles require special attention on the web.

The
figure shows a subtitled frame from Sun's Starfire video. The small subtitles (left image)
look good on the original
video tape (JPEG, 197 K) but are virtually unreadable on the smaller image size
currently used for computerized videos. Using bigger subtitles that have been anti-aliased
for computer viewing (middle image) improves readability significantly, but the best
results are achieved by the letterbox format (right image). In this example, the subtitles
in the letterbox are constructed by enlarging the video area for the movie file with a
24-pixels high black area. Doing so does not increase the file size proportionally since
the black area compresses very nicely. Even so, it would be better to transmit the
subtitles as ASCII (or Unicode) and have them rendered in the letterbox on the client
machine: a perfect job for an applet. It would even be possible to have the user select
the language for the subtitles through a preference setting or a pop-up menu (JPEG, 206 K).
Audio
The
main benefit of audio is that it provides a channel that is separate from that of the
display. Speech can be used to offer commentary or help without obscuring information on
the screen. Audio can also be used to provide a sense of place or mood as done to
perfection in the game Myst. Mood-setting audio should employ very quiet
background sounds in order not to compete with the main information for the user's
attention.
Music
is probably the most obvious use of sound. Whenever you need to inform the user about a
certain work of music, it makes much more sense to simply play it than to show the notes
or to try to describe it in words. For example, if you are out to sell seats to the La Scala opera in Milan, Italy, it is an obvious ploy
to allow users to hear a snippet of the opera: yes, Verdi
really could write a good tune (AU file, 1.4 MB), so maybe I will go and hear the
opera next time I am over there. In fact, the audio clip is superior to the video clip
from the same opera which is too fidget to impress the user and yet takes much too long to
download (QuickTime, 3.6 MB).
Voice
recordings can be used instead of video to provide a sense of the speaker's personality (AU
file, 1.4 MB): the benefits are smaller files, easier production, and the fact that people
often sound good even if they would look dull on television. Speech is also perfect for
teaching users the pronunciation of words as done by the French wine site: it used to be
the case that you could buy good wine cheaply by going for chateaus that were hard to
pronounce (because nobody dared ask for them in shops or restaurants) -- no more in the
webbed world.
Non-speech
sound effects can be used as an extra dimension in the user interface to inform users
about background events: for example, the arrival of new information could be signaled by
the sound of a newspaper dropping on the floor and the progress of a file download could
be indicated by the sound of water pouring into a glass that gradually fills up. These
kinds of background sounds have to be very quiet and nonintrusive. Also, there always
needs to be a user preference setting to turn them off.
Good
quality sound is known to enhance the user experience substantially so it is well worth
investing in professional quality sound production. The classic example is the video game
study where users claimed that the graphics were better when the sound was improved, even
though the exact same graphics were used for the poor-quality sound and the good-quality
sound experiments. Simple examples from web user interfaces are the use of a low-key
clicking sound to emphasize when users click a button and the use of opposing sounds
(cheeeek chooook) when moving in different directions through a navigation space.
Response
Time
Many multimedia elements are big and take a long time to download
with the horribly low bandwidth available to most users. It is recommended that the file
format and size are indicated in parentheses after the link whenever you point to a file
that would take more than 15 seconds to download with the bandwidth available to most of
your users. If you don't know what bandwidth your users are using you should do a survey
to find out since this information is important for many other page design issues. At this
time, most home users have at most 28.8 Kb, meaning that files longer than 50 KB need a
size warning. Business users often have higher bandwidth, but you should probably still
mark files larger than about 200 KB.
The
15-second guideline in the previous paragraph was derived from the basic set of response
time values that have been known since around 1968. System response needs to happen within
about 10 seconds to keep the user's attention, so users should be warned before slower
operations. On the web, current users have been trained to endure so much suffering that
it may be acceptable to increase the limit value to 15 seconds. If we ever want the
general population to start treating the web as more than a novelty, we will have to
provide response times within the acceptable ranges, though.
Design
of client-side multimedia effects has to consider the other two response time limits also:
- The
feeling of directly manipulating objects on the screen requires 0.1 second
response times. Thus, the time from the user types a key on the keyboard or moves the
mouse until the desired effect happens has to be faster than 0.1 seconds if the goal is to
let the user control a screen object (e.g., rotate a 3D figure or get pop-ups while moving
over an imagemap).
- If users
do not need to feel a direct physical connection between their actions and the changes on
the screen, then response times of about 1.0 second become acceptable.
Any slower response and the user will start feeling that he or she is waiting for the
computer instead of operating freely on the data. So, for example, jumping to a new page
or recalculating a spreadsheet should happen within a second. When response times surpass
a second, users start changing their behavior to a more restricted use of the system (for
example, they won't try out as many options or go to as many pages).
Size
Limits for Web Pages
The
following table shows the maximum allowable page size in order to achieve desired response
times for various connection speeds. The numbers assume 0.5 s latency which is faster than
most Web connections these days, so for many realistic purposes, page sizes really need to
be even smaller than indicated in the table.
| |
1 second
response time |
10 seconds
response time |
Modem |
2 K |
34 K |
ISDN |
8 K |
150 K |
T-1 |
100 K |
2 M |
The
concept of "page size" is defined as the sum of the file sizes for all the
elements that make up a page, including the defining HTML file as well as all embedded
objects (e.g., image files with GIF and JPG pictures). As further discussed in my main
article on Web response times, it is sometimes possible to get away with page designs that
have larger page sizes as long as the HTML file is small and is coded to reduce the
browser's rendering time.
Note that the 1 sec. reponse time limit is required for
users to feel that they are moving freely through the information space. Staying below the
10 sec. limit is required for users to keep their attention on the task.
In mid-1997, a study found that the mean size of Web pages
was 44 kilobytes. This is more than five times too big for optimal response time for
ISDN users, so even when more people get mid-band connections, the Web will be much
too slow. Also note that 44KB is 30% larger than even the most generous size limit for
modem users.
See Also: the Bandwidth Conservation Society