MISTRAL - Measurable intelligent and secure semantic extraction and retrieval of multimedia data

home › research projects › MISTRAL - Measurable intelligent and secure semantic extraction and retrieval of multimedia data

MISTRAL - Measurable intelligent and secure semantic extraction and retrieval of multimedia data

Period

2005 — 2006

Funding

Österreichische Forschungsförderungsgesellschaft mbH , FFG

Partners

Hyperwave GmbH
Kompetenzzentrum für wissensbasierte Anwendungen und Systeme Forschungs- und Entwicklungs GmbH, Know-Center
SAIL LABS Technology AG

Research Areas

Speech Communication

Contact

Gernot Kubin

Multimedia data has a rich and complex structure in terms of inter- and intra-document references and can be an extremely valuable source of information. However, this potential is severely limited until and unless effective methods for semantic extraction and semantic-based cross-media exploration and retrieval can be devised. Todays leading-edge techniques in this area are working well for low-level feature extraction (e.g. colour histograms), are focussing on narrow aspects of isolated collections of multimedia data, and are dealing only with single media types. MISTRAL follows the following lines of radically new research: MISTRAL will extract a large variety of semantically relevant metadata from one media type and integrate it closely with semantic concepts derived from other media types. Eventually, the results from this cross-media semantic integration will also be fed back to the semantic extraction processes of the different media types so as to enhance the quality of the results of these processes. MISTRAL will focus on most innovative, semantic-based cross-media exploration and retrieval techniques employing concepts at different semantic levels. MISTRAL addresses the specifics of multimedia data in the global, networked context employing semantic web technologies. The MISTRAL results for semantic-based multimedia retrieval will contribute to a significant improvement of todays human-computer interaction in multimedia retrieval and exploration applications. New types of functionalities include but are not limited to o cross-media-based automatic detection of objects in multimedia data: For example, if a video contains an audio stream with barking together with a particular constellation of video features, the system can automatically consider the features in the video as an object dog.

(1) semantic-enriched cross-media queries: A sample query could be find all videos with a barking dog in the background and playing children in the foreground.

(2) cross-media synchronisation: The idea is to synchronize independent types of media according to the extracted semantic concepts. For example, if users see somebody walking in a video, they should also hear footfall from an audio.

ECONOMIC RELEVANCE The amount of multimedia data available world-wide and its network-based linkage will continue its rapid growth in the foreseeable future: According to a study conducted at the University of California, Berkeley an amount of 800 Megabyte of new data is created per year and per capita. The same study estimates that of a total amount of 5 Exabytes of information available world-wide, about 92% exists in electronic form, with a 170 Terabyte share being available on the Internet. Image, video and audio data is becoming the predominant form of information forming this global asset. This deluge of multimedia data calls for new semantic extraction and most innovative retrieval and exploration techniques. Awareness of the importance of semantic extraction and semantic-based retrieval is slowly beginning to enter mainstream business thinking, as evidenced in a recent study of Gartner which notes that “Contrary to many enterprises’ expectations, search technology hasn’t settled into a stable commodity sold by a few giant enterprises” and advises enterprises to “select products with robust functions to examine semantic structures in corpora”. In order to produce systems that live up to the high expectations, substantial and highly innovative research is still required. Taken together, these two points mean that the first implementations of the envisaged system within organisations can be expected within the next 3 to 5 years (because the market will be ready by then and semantic technologies will not be ready before then), and rollouts on a larger scale in a 5 to 8 years time frame.