Seminário da Pós: “My Database User is a Large Language Mode”

Dia 19, às 15h, acontecerá o seminário “My Database User is a Large Language Mode“, proferido pelo professor Marco A. Casanova.

Seminário da Pós: “My Database User is a Large Language Mode”

Resumo do Seminário: The leaderboards of familiar benchmarks indicate that the best text-to-SQL tools are based on Large Language Models (LLMs). However, when applied to real-world databases, the performance of LLM-based text-to-SQL tools is significantly less than that reported for these benchmarks. A closer analysis reveals that one of the problems lies in that the relational schema is an inappropriate specification of the database from the point of view of the LLM. In other words, the target user of the database specification is the LLM, rather than an end-user or a database programmer. This talk then argues that the text-to-SQL task can be greatly facilitated by providing a database specification based on the use of LLM-friendly views that are close to the language of the users’ questions and that eliminate frequently used joins, and LLM-friendly data descriptions of the database values. The talk first introduces a proof-of-concept implementation of three sets of LLM-friendly views over a relational database, whose design is inspired by a proprietary relational database, and a set of 100 Natural Language (NL) questions that mimic those posed by users. The talk then presents experiments to test a text-to-SQL prompt strategy implemented with LangChain, using GPT-3.5 and GPT-4, over the sets of LLM-friendly views and data samples, as the LLM-friendly data descriptions. The results suggest that the specification of LLM-friendly views and the use of data samples, albeit not too difficult to implement over a real-world relational database, are sufficient to considerably improve the accuracy of the prompt strategy. The talk concludes with a discussion of the results obtained and suggests further approaches to simplify the text-to-SQL task.

Conheça o Professor: Marco is Full Professor at the Department of Informatics and Coordinator of the Central Planning and Evaluation Office of the Pontifical Catholic University of Rio de Janeiro – PUC-Rio. He graduated in Electronic Engineering at the Military Institute of Engineering (1974), obtained a M.Sc. in Informatics from PUC-Rio (1976) and a M.Sc. (1977) and a Ph.D. (1979) in Applied Mathematics from Harvard University. He was Graduate Program Coordinator (2005-2007) and Director (2007-2011) of the Department of Informatics of PUC-Rio. His research interests concentrate on database conceptual modeling and construction of database management systems. In July 2012, he received the Scientific Merit Award from the Brazilian Computer Society.

Você pode assisti-lo presencialmente na sala 511 do RDC ou via Youtube pelo link: https://youtube.com/live/Sx1_D5wW1q0