Palestra: Information Extraction for Building Knowledge Bases

Data 5af, 8/3/2012, 16hs
Sala 511RDC


So far, information extraction from the Web has mostly focused on matching syntactic structures
found on Web pages into predefined templates. We generalize this established model in two
ways. First, we show that for some applications (we have done it for life science databases),
the conceptual information is available on the Web page and it is useful and possible to extract both
a conceptual and a factual representation of the underlying content at the same time. Second,
we generalize syntactic structures to perceptual structures. As a first step in this direction,
we have extended XPath to include spatial primitives into the query language. Initial experiments show
that such a query mechanism is much better at generalizing from different underlying
data sources than syntax-only based mechanisms.

O Prof. Staab é Professor na área de Banco de Dados e Sistemas de Informação na Universidade de Koblenz-Landay. Ele dirige o Insituto para “Web Science and Technologies”(WES). É Co-Chair de Programa da WW 2012, e foi General Chair da ISWC 2011 e WebScience 2011. É também Editor Chefe do Journal of Web Semantics da Elsevier. Seus interesses incluem diversos aspectos de Web Science, tais como Web Semântica, a Web Social, a Web Multimidia, a Web de Software e a Web Interativa. Ele também coordena o projeto integrado da EU “Robust – Risk and Opportunities Management of Huge-Scale Business Community Cooperation”.