Seminário Towards Summarizing and Making Sense of the Blogosphere

The extraction of structured information from text is a fast improving subfield of Natural Language Processing and Information Retrieval which has been re-invigorated with the ever-increasing availability of user-generated textual content on the Web. One environment which stands out as a source of invaluable information is the blogosphere–-the network of social media sites, in which individuals express and discuss opinions, facts, events, and ideas pertaining to their lives, their community, profession, or society at large. Indeed, the automatic extraction of reliable information from the blogosphere promises a viable approach for discovering very rich social data. Considerable attention has been given to studying the social dynamics among the participants (i.e., authors) in shared environments like the blogosphere. In that line of work, the goal is to understand how the network of humans communicating in the blogosphere forms and evolves over time, and how it influences participants in the process. Our goal, on the other hand, is to extract the network of entities, facts, ideas and opinions expressed in social media sites, as well as the relationships among them. Such structured data can be organized as one or more information networks, which in turn are powerful metaphors for the study and visualization of the conversations in the blogosphere. In this talk, I will cover the work on extracting and linking entities and relations among entities in the context of our ongoing SONEX project, which is an open relation extraction system based on a combination of text clustering and other unsupervised methods. I will also cover a large-scale experimental evaluation of open relation extraction which reveals a lot of room for improvement in the area.

Denilson Barbosa, U. of Alberta, Canada
Local: RDC511
Data : 15/8, 4ª feira das 16-18hs