Critical
Issues in Content Repurposing for Small Devices
Neil C. Rowe
U.S. Naval
Postgraduate School
Abstract
Small handheld
devices are increasingly popular, but it is difficult to display images, audio,
and video on them as users would like.�
The limitations of such devices in display and processing for multimedia
require significant planning to overcome.�
We discuss several ideas that are being studied, including panning and
zooming, translation, substitution of links, and reformulation of data.� Many of these require some ranking of data
content, so we also discuss methods for that, as well as the process of redoing
a data display.� This topic is a highly
active area of research, and many innovations will be appearing soon.
This is a chapter in the Encyclopedia of Multimedia Technology and
Networking, ed. M. Pagani, Hershey, PA: The Idea Group, 2005.
Introduction
Content repurposing is the reorganizing of data for
presentation on different display hardware (Singh, 2004).� It has been particularly important recently
with the growth of handheld devices such as "personal digital
assistants" (PDAs), sophisticated telephones, and other small specialized
devices.� Unfortunately, such devices
pose serious problems for multimedia delivery.�
With their tiny screens (150 by 150 for a basic Palm PDA or 240 by 320
for a more modern one, versus 640 by 480 for standard computer screens), one
cannot display much information (like most of a Web page); with their low
bandwidths, one cannot display video and audio transmissions from a server
("streaming") with much quality; and with their small storage
capabilities, large media files cannot be stored for later playback.� Furthermore, new devices and old ones with
new characteristics have been appearing at a high rate, so software vendors are
having difficulty keeping pace.� So some
real-time, systematic, and automated planning could be helpful in figuring how
to show desired data, especially multimedia, on a broad range of devices.
Background
The World Wide Web is the de facto standard for
providing easily accessible information to people.� So it is desirable to use it and its language HTML as a basis for
display for small handheld devices.�
This would enable people to look up ratings of products while shopping,
check routes while driving, and perform knowledge-intensive jobs while
walking.� HTML is, in fact,
device-independent: It requires the display device and its Web-browser software
to make decisions about how to display its information within guidelines.� But HTML does not provide enough information
to devices to ensure much user-friendliness of the resulting display: It does
not tell the browser where to break lines or which graphics to keep
colocated.� Display problems are exacerbated
when screen sizes, screen shapes, audio capabilities, or video capabilities are
significantly different.��
"Microbrowser" markup languages like WML, S-HTML, and HDML,
that are based on HTML but designed to better serve the needs of small devices,
help but these only solve some of the problems.
Content repurposing is a general term for
reformatting information for different displays.� It occurs frequently with content management for an
organization's publications (Boiko, 2002) where "content" or information
is broken into pieces and entered in a "repository" to be used for
different publications.� However, a
repository is not cost-effective unless the information is reused many times,
something not generally true for Web pages.�
Content repurposing for small devices also involves real-time decisions
about priorities.� For these reasons,
the repository approach is not often used with small devices.
Content repurposing can be done either before or
after a request for it.� Preprocessing
can create separate pages for different devices, and the device fetches the
page appropriate to it.� It can also
involve conditional statements in pages which cause different code to be
executed for different devices; such statements can be done with code in
JavaScript or PHP embedded within HTML, or with more complex server code using
such facilities as Java Server Pages (JSP) and Active Server Pages (ASP).� It can also involve device-specific planning
(Karadkar, 2004).� Many popular Web
sites provide preprocessed pages for different kinds of devices.� Preprocessing is cost-effective for
frequently-needed content, but requires setup time and can require considerable
storage space if there is a large amount of content and ways to display it.
Content repurposing can also be either client-side
or server-side.� Server-side means a
server supplies repurposed information for the client device; client-side means
the device itself decides what to display and how.� Server-side repurposing saves work for the device, which is important
for primitive devices, and can adjust to fluctuations in network bandwidth (Lyu
et al, 2003), but requires added complexity in the server and significant time
delays in getting information to the server.�
Devices can have designated "proxy" servers for their
needs.� Client-side repurposing, on the
other hand, can respond quickly to changing user needs.� Its disadvantages are the additional
processing burden on an already-slow device, and higher bandwidth demands since
information is not eliminated until after it reaches the device.� The limitations of small devices require
most audio and video repurposing to be server-side.
Methods of
Content Repurposing
Repurposing Strategies
Content repurposing for small devices can be
accomplished by several methods, including panning, zooming, reformatting,
substitution of links, and modification of content.
A default repurposing method of the Internet
Explorer and Netscape browser software is to show a "window" on the
full display when it is too large to fit on the device screen.� Then the user can manipulate slider bars on
the bottom and side of the window to view all the content ("pan" over
it).� Some systems break content into
overlapping "tiles" (Kasik, 2004), precomputed units of display
information, and users can pan only from tile to tile; this can preventing
splitting of key features like buttons and simplifies client-side processing
but only works for certain kinds of content. Panning may be unsatisfactory for
large displays like maps since considerable screen manipulation may be
required, and good understanding may require an overview.� But it works fine for most content.�
Another idea is to change the scale of view,
"zooming" in (closer) or out (further).� This can be either automatic or user-controlled.� The MapQuest city-map utility (www.mapquest.com)
provides user-controlled zooming by dynamically creating maps at several levels
of detail, so the user can start with a city and progressively narrow on a
neighborhood (as well as do panning).� A
problem for zooming out is that some details like text and thin lines cannot be
shrunk beyond a certain minimum size and still remain legible.� Such details may be optional; for instance,
MapQuest omits most street names and many of the streets in its broadest view.� But this may not be what the user wants.� Different details can be shrunk at different
rates, so that lines one pixel wide are not shrunk at all (Ma & Singh,
2003), but this require content-specific tailoring.
The formatting of the page can be modified to use
equivalent constructs that display better on a destination device (Government
of Canada, 2004).� For instance with
HTML, the fonts can be made smaller or narrower (taking into account
viewability on the device) by "font" tags, line spacing can be
reduced, or blank space can be eliminated.�
Since tables take extra space, they can be converted into text.� Small images or video can substitute for
large images or video when their content permits.� Text can be presented sequentially in the same box in the screen
to save display space (Wobbrock et al, 2002).�
For audio and video, the sampling or frame rate can be decreased (one
image per second is fine for many applications provided the rate is
steady).� Visual clues can be added to
the display to indicate items just offscreen (Baudisch & Rosenholtz, 2003).
Clickable links can point to blocks of
less-important information, thereby reducing the amount of content to be
displayed at once.� This is especially
good for media objects (which can require both bandwidth and screen size) but
also helps for paragraphs of details.�
Links can be thumbnail images, which is helpful for pages familiar to
the user.� Links can also point to pages
containing additional links so the scheme can be hierarchical.� (Buyukkoten et al, 2002) in fact
experimented with repurposing displays containing links exclusively.� But insertion of links requires rating the
content of the page by importance, a difficult problem in general (as discussed
below), to decide what content is converted into links.� It also requires a careful wording of text
links since just something like "picture here" is unhelpful, but a
too-long link may be worse than no link at all.� Complex link hierarchies may also cause users to get lost.
One can also modify the content of a display by just
eliminating unimportant or useless detail and rearranging the display (Gupta et
al, 2003).� For instance,
advertisements, acknowledgements, and horizontal bars can be removed, as well
as JavaScript code and Macromedia Flash (SWF) images since most are only
decorative.� Removed content need not be
contiguous, as with removal of a power subsystem from a system diagram.� In addition, forms and tables can lose their
associated graphics.� The lines in block
diagrams can often be shortened when their lengths do not matter.� Color images can be converted to
black-and-white, though one must be careful to maintain feature visibility,
perhaps by exaggerating the contrast.�
User assistance in deciding what to eliminate or summarize is helpful as
user judgment provides insights that cannot easily be automated, as with
selection of "highlights" for video (Pea et al, 2004).� An important special application is
selection of information from a page for each user in a set of users (Han,
Perret, & Naghshineh, 2000).�
Appropriate modification of the display for a mobile device can also be
quite radical; for instance, a good way to support route-following on a small
device could be to give spoken directions rather than a map (Kray et al, 2003).
Content Rating by Importance
Several of the techniques mentioned above require
judgment as to what is important in the data to be displayed.� The difficulty of automating this judgment
varies considerably with the type of data.
Many editing tools mark document components with
additional information like "style" tags, often in a form compatible
with the XML language.� This information
can assign additional categories to information beyond those of HTML, like
identifying text as a "introduction", "promotion",
"abstract", "author biography", "acknowledgements",
"figure caption", "links menu", or "reference
list" (Karben, 1999).� These
categories can be rated in importance by content-repurposing software, and only
text of the top-rated categories shown when display space is tight.� Such categorization is especially helpful
with media objects (Obrenovic, Strarcevic, and Selic, 2004), but their
automatic content analysis is difficult and it helps to persuade people to
categorize them at least partially.
In the absence of explicit
tagging, methods of automatic text summarization from natural-language
processing can be used.� This
technology, useful for building digital libraries, can be adapted for the
content repurposing problem to display an inferred abstract of a page.� One approach is to select sentences from a
body of text that are the most important as measured by various metrics
(McDonald & Chen, 2002; Alam et al, 2003) like titles and section headings,
first sentences of paragraphs, and distinctive keywords.� Keywords alone may suffice to summarize text
when the words are sufficiently distinctive (Buyukkoten et al, 2002).� Distinctiveness can be measured by classic
measure of TF-IDF, which is �where K is the number
of occurrences of the word in the "document" or text to be summarized,
N is a sample of documents, and n is the number of those documents in that
sample having the word at least once. Other useful input for text summarization
are the headings of pages linked to (Delort, Bouchon-Meunier, & Rifqi,
2003) since neighbor pages provide content clues.� Content can also be classified into semantic units by aggregating
clues or even by "parsing" the page display.� For instance, the "@" symbol
suggests a paragraph of contact information.
Media objects pose more serious problems than text,
however, since they can require large bandwidths to download, and images can
require considerable display space.� In
many cases the media can be inferred to be decorative and can be eliminated, as
for many banners and sidebars on pages as well as background sounds.� Simple criteria can distinguish decorative
graphics from photographs (Rowe, 2002): size (photographs are larger),
frequency of the most common color (graphics have a higher frequency), number of
different colors (photographs have more), extremeness of the colors (graphics
are more likely to have pure colors), and average variation in color between
adjacent pixels in the image (photographs have less).� (Hu and Bagga, 2004) extends this to classify images in order of
importance as "story", "preview", "host",
"commercial", "icons and logos", "headings", and
"formatting".� Images can be
rated by these methods, then only the top-rated images displayed until
sufficient to fill the screen.� Such
rating methods are rarely necessary for video and audio which are almost always
accessed by explicit links.� Planning
can be done on the server for efficient delivery (Chandra, Ellis, & Vahdat,
2000) and the most important media objects can be delivered first.
In some cases, preprocessing can analyze the content
of the media object and extract the most representative parts.� Video is a good example because it is
characterized by much frame-to-frame redundancy.� A variety of techniques can extract representative frames (say
one per shot) that convey the gist of the video and reduce the display to a
"slide show".� If an image is
graphics containing sub-objects, then the less-important sub-objects can be
removed and a smaller image constructed.�
An example is a block diagram where text outside the boxes represents
notes that can be deleted.� Heuristics
useful for finding important sub-objects are nearby labels, objects at ends of
long lines, and adjacent blank areas (Kasik, 2004).� Processing can also in some applications do "visual
abstraction" where, say, a rectangle is substituted for a complex part of
the diagram that is known to be a conceptual unit (Egyed, 2002).
Redrawing the Display
Many of methods discussed require changing the
layout of a page of information.� Thus
content repurposing needs to use methods of efficient and user-friendly display
formatting (Kamada & Kawai, 1991; Tan, Ong, & Wong, 1993).� This can be a difficult constraint
optimization problem where the primary constraints are those of keeping related
information together as much as possible in the display.� Examples of what needs to be kept together
are section headings with their subsequent paragraphs, links with their
describing paragraphs, images with their captions, and images with their text
references.� Some of the necessary constraints,
including device-specific ones, can be learned from observing users (Anderson,
Domingos, & Weld, 2001).� Even with
good page design, content search tools are helpful with large displays like
maps to enable users to find things quickly without needing to pan or zoom.
�
Future Work
�
Content repurposing is currently an active area of
research and we are likely to see a number of innovations in the near future in
both academia and industry.� The large
number of competing approaches will dwindle as concensus standards are reached
for some of the technology, much as de facto standards have emerged in Web-page
style.� It is likely that manufacturers
of small devices will provide increasingly sophisticated repurposing in their
software to reduce the burden on servers.�
XML will increasingly be used to support repurposing, as it has achieved
widespread acceptance in a short time for many other applications.� XML will be used to provide standard
descriptors for information objects within organizations.� But XML will not solve all problems, and the
issue of incompatible XML taxonomies could impede progress.
Conclusion
Content repurposing has recently become a key issue
in management of small wireless devices as people want to display the
information they can display on traditional screens and have discovered that it
often looks bad on a small device.� So
strategies are being devised to modify display information for these
devices.� Simple strategies are
effective for some content, but there are many special cases of information
which require more sophisticated methods due to their size or organization.
References
Alam,
H., Hartono, R., Kumar, A., Rahman, F., Tarnikov, Y., & Wilcox, C.
(2003).� Web page summarization for
handheld devices: a natural language approach.�
Proceedings of 7th
International Conference on Document Analysis and Recognition, 1153-1158.
Anderson, C., Domingos, P.
& Weld, D. (2001, May).�
Personalizing Web sites for mobile users.� Proceedings of 10th
International Conference on the World Wide Web, Hong Kong, China, 565-575.
Baudisch, P., &
Rosenholtz, R. (2003).� Halo: a
technique for visualizing off-screen objects.�
Proceedings of Conference on Human
Factors in Computing Systems, Ft. Lauderdale, FL, 481-488.
Boiko, B. (2002).� Content
management bible.� New York: Hungry
Minds.
Buyukkokten,
O., Kaljuvee, O., Garcia-Molina, H., Paepke, A., & Winograd, T. (2002,
January).� Efficient Web browsing on
handheld devices using page and form summarization.� ACM Transactions on
Information Systems, 20 (1), 82-115.
Chandra, S., Ellis, C.,
& Vahdat, A., (2000, December).�
Application-level differentiated multimedia Web services using quality
aware transcoding.� IEEE Journal on Selected Areas in Communications, 18 (12),
2544-2565.
Delort,
J.-Y., Bouchon-Meunier, B., & Rifqi, M. (2003, August).� Enhanced Web document summarization using
hyperlinks.� Proceedings of 14th ACM Conference on Hypertext and
Hypermedia, Nottingham, UK, 208-215.
Egyed,
A. (2002, October).� Automatic
abstraction of class diagrams.� IEEE Transactions on Software Engineering
and Methodology, 11 (4), 449-491.
Government
of Canada (2004).� Tip sheets: Personal
Digital Assistants (PDA).� Retrieved May
5, 2004 from www.chin.gc.ca/English/Digital_Content/Tip_Sheets/Pda.�
Gupta, S., Kaiser, G.,
Neistadt, D., Grimm, P. (2003, May).�
DOM-based content extraction of HTML documents.� Proceedings
of 12th International Conference on the World Wide Web,
Budapest, Hungary, 207-214.
Han, R., Perret, V., &
Naghshineh, M. (2000, December).�
WebSplitter: A unified XML framework for multi-device collaborative Web
browsing.� Proceedings of ACM Conference on Computer Supported Cooperative Work,
Philadelphia, PA, 221-230.
Hu,
J., & Bagga, A. (2004, January-March).�
Categorizing images in Web documents.�
IEEE Multimedia, 11 (1),
22-30.
Jing,
H., & McKeown, K. (2000).� Cut and
paste based text summarization.� Proceedings of First Conference of North
American Chapter of the Association for Computational Linguistics, Seattle,
WA, 178-185.
Kamada,
T., & Kawai, S. (1991, January).� A
general framework for visualizing abstract objects and relations.� ACM
Transactions on Graphics, 10 (1), 1-39.
Karadkar,
U., Furuta, R., Ustun, S., Park, Y., Na, J.-C., Gupta, V., Ciftci, T., &
Park, Y. (2004, August).� Display-agnostic
hypermedia.� Proceedings of 15th ACM Conference on Hypertext and
Hypermedia, Santa Cruz, CA, 58-67.
Karben, A. (1999,
March).� News you can reuse -- content
repurposing at The Wall Street Journal Interactive Edition.� Markup
Languages: Theory & Practice, 1 (1), 33-45.
Kasik,
D. (2004, January-March).� Strategies
for consistent image partitioning.� IEEE Multimedia, 11 (1), 32-41.
Kray,
C., Elting, C., Laakso, K., & Coors, V. (2003).� Presenting route instructions on mobile devices.� Proceedings
of 8th International Conference on Intelligent User Interfaces,
Miami, FL, 117-124.
Lyu,
M., Yen, J., Yau, E., & Sze, S. (2003, November)� A wireless handheld multi-modal digital video library client
system.� Proceedings of 5th ACM International Workshop on Multimedia
Information Retrieval, Berkeley CA, 231-238.
Ma,
R.-H., & Singh, G. (2003).�
Effective and efficient infographic image downscaling for mobile
devices.� Proceedings of 4th International Workshop on Mobile
Computing, Rostock, Germany.
McDonald,
D., & Chen, H. (2002, July).� Using
sentence-selection heuristics to rank text in XTRACTOR.�� ACM-IEEE Joint Conference on Digital Libraries, Portland, OR, 28-35.
Obrenovic,
Z., Starcevic, D., & Selic, B. (2004, January-March).� A model-driven approach to content
repurposing.� IEEE Multimedia, 11 (1), 62-71.
Pea,
R., Mills, M., Rosen, J., & Dauber, K. (2004, January-March).� The DIVER project: interactive digital video
repurposing.� IEEE Multimedia, 11 (1), 54-61.
Rowe,
N. (2002, July/August).� MARIE-4: A
high-recall, self-improving Web crawler that finds images using captions.� IEEE
Intelligent Systems, 17 (4), 8-14.
Singh,
G. (2004, January-March).� Content
repurposing.� IEEE Multimedia, 11 (1), 20-21.
Tan, K., Ong, G., & Wong, P. (1993, July).� A heuristics approach to automatic data flow
diagram layout.� Proceedings of 6th International Workshop on Computer-Aided
Software Engineering, Singapore, 314-323.
Wobbrock, J., Forlizzi, J., Hudson, S., & Myers, B.
(2002, October).� WebThumb: interaction
techniques for small-screen browsers.� Proceedings
of 15th ACM Symp. on User Interface Software and Technology,
Paris, France, 205-208.
Definitions of Terms
content management: Management of Web pages as assisted by software,
"Web page bureaucracy".
content repurposing: Reorganizing or modifying the content of a
graphical display to fit effectively on a different device than its original
target.
microbrowser: A Web browser designed for a small device.
key frames: Representative shots extracted from a video that illustrate
its main content.
pan: Move an image window with respect to the portion of the larger
image from which it is taken..
PDA: "Personal Digital Assistant", a small electronic device
that functions like a notepad.
streaming: Sending multimedia data to a client device at a rate the
enables it to be played without having to store it.
tag: HTML and XML markers that delimit semantically meaningful units in
their code.
XML:
Extensible Markup Language, a general language for structuring information on
the Internet for use with the HTTP protocol, an extension of HTML.
zoom: Change the fraction of an image being displayed when that image
is taken from a larger one.