Defesa de Dissertação de Mestrado do aluno Eduardo Roger Silva Nascimento.

Título da dissertação: Querying Databases with Natural Language: The use of Large Language Models for Text-to-SQL tasks. 

Resumo: Text-to-SQL involves generating an SQL query based on a given relational database and a natural language question. While the leaderboards of well-known benchmarks indicate that Large Language Models (LLMs) excel in this task, they are evaluated on databases with simpler schemas. This dissertation investigates the performance of LLM-based text-to-SQL models on a complex and openly available database (Mondial) with a larger schema and a set of 100 Natural Language (NL) questions. Running under GPT-3.5 and GPT-4, the results show that LLM-based tools perform significantly less effectively than reported in these benchmarks and struggle with schema linking and joins, suggesting that the relational schema may not be suitable for LLMs. The dissertation proposes using LLM-friendly views and data descriptions for better accuracy in the text-to-SQL task. In the experiment, using the text-to- SQL tool with the best performance and cost from the previous experiment and another set with 100 questions over a real-world database, the results show that the use of LLM-friendly views and data samples, albeit not too difficult to implement, is sufficient to considerably improve the accuracy of the prompt strategy. The dissertation concludes with a discussion of the results obtained and suggests further approaches to simplify the text-to-SQL task.

Orientador: Prof. Dr. Marco Antonio Casanova

Banca: Prof. Dr. Vânia Maria Ponte Vidal | Prof. Dr. Melissa Lemos Cavaliére | Prof. Dr. Luiz André Portes Paes Leme

