Defesa de Dissertação de Mestrado do aluno Ney Barchilon.
Título da dissertação: Enriquecimento de Dados com Base em Estatísticas de Grafo de Similaridade para Melhorar o Desempenho em Modelos de ML Supervisionados de Classificação
Resumo: A otimização do desempenho dos modelos de aprendizado de máquina supervisionados representa um desafio constante, especialmente em contextos com conjuntos de dados de alta dimensionalidade ou com numerosos atributos correlacionados. Neste estudo, é proposto um método para o enriquecimento de conjuntos de dados tabulares, fundamentado na utilização de estatísticas provenientes de um grafo construído a partir da similaridade entre as instâncias presentes neste conjunto de dados, buscando capturar correlações estruturais entre esses dados. As instâncias assumem o papel de vértices no grafo, enquanto as conexões entre elas refletem sua similaridade. O conjunto de características originais (FO) é enriquecido com as estatísticas extraídas do grafo (FG) na busca pela melhora do poder preditivo dos modelos de aprendizado de máquina. O método foi avaliado em dez conjuntos de dados de distintas áreas de conhecimento, em dois cenários distintos, sobre sete modelos de aprendizado de máquina, comparando a predição sobre o conjunto de dados inicial (FO) com o conjunto de dados enriquecido com as estatísticas extraídas do seu grafo (FO+FG). Os resultados revelaram melhorias significativas na métrica de acurácia, com um aprimoramento médio de aproximadamente 4,9%. Além de sua flexibilidade para integração com outras técnicas de enriquecimento existentes, o método se apresenta como uma alternativa valiosa, sobretudo em situações em que os conjuntos de dados originais carecem das características necessárias para as abordagens tradicionais de enriquecimento com a utilização de grafo.
Orientador: Prof. Dr. Helio Côrtes Vieira Lopes
Banca: Prof. Dr. Marcos Kalinowski | Prof. Dr. Jefry Sastre Pérez | Profª Drª Tatiana Escovedo
Assista a defesa pelo link: https://puc-rio.zoom.us/j/98183420840?pwd=WHdraWpTRHVOK2xUcnNzdWVCWUh4dz09
Autor: Bruno Frederico Maciel Gutierrez
Orientador: Hélio Côrtes Vieira Lopes
Data e Hora: 12/04/2024 às 14:00
Local: Videoconferência
Autor: Ney Barchilon
Orientador: Hélio Côrtes Vieira Lopes
Data e Hora: 11/04/2024 às 08:00
Local: Videoconferência
Defesa de Dissertação de Mestrado do aluno Rodrigo Galdino Ximenes.
Título da dissertação: Issues that lead to code technical debt in machine learning systems
Resumo: [Context] Technical debt (TD) in machine learning (ML) systems, much like its counterpart in software engineering (SE), holds the potential to lead to future rework, posing risks to productivity, quality, and team morale. However, better understanding code-related issues leading to TD in ML systems is still a green field. [Objective] This paper aims to identify and discuss the relevance of code-related issues leading to TD in ML code throughout the ML life cycle. [Method] The study consisted of first compiling a list of potential issues that can lead to TD in ML code by analyzing the ML life cycle phases and their typical tasks. Thereafter, the list of issues was refined by assessing the prevalence and relevance of the issues leading to ML code TD through feedback collected from industry practitioners in two focus group sessions. [Results] The study compiled a list of 34 potential issues contributing to TD in the source code of ML systems. Through two focus group sessions with nine participants, this list was refined into 30 issues leading to ML code-related TD, with 18 considered highly relevant. The data pre-processing phase was the most critical, with nine issues considered highly relevant in potentially leading to severe ML code TD. Four issues were considered highly relevant in the phases of data collection, model creation and training. The final list of issues is available to the community. [Conclusion] The list can help to raise awareness on issues to be addressed throughout the ML life cycle to minimize accruing TD, helping to improve the maintainability of the ML system.
Orientador: Prof. Dr. Marcos Kalinowski
Banca: Prof. Dr. Tatiana Escovedo | Prof. Dr. Maria Teresa Baldassarre | Prof. Dr. Rodrigo Oliveira Spínola | Prof. Dr. Helio Côrtes Vieira Lopes
Assista a defesa pelo link: https://puc-rio.zoom.us/j/4666190940?pwd=eUdNaDNSbnhEY3VWWU1DMGF0SkRjZz09
Inscrições abertas para o Programa SPARK! Uma parceria entre o Laboratório ExACTa e a Eletrobras.
O SPARK do Innovation Grid é o primeiro programa da companhia que conecta o público universitário aos desafios tecnológicos e de negócio do setor de energia para promover o surgimento de novas soluções e desenvolver talentos. Os estudantes mais talentosos serão selecionados para formar equipes de desenvolvimento ágil, conhecer os desafios e negócios da companhia e se conectarem diretamente com os times de inovação da Eletrobras.
Não perca essa oportunidade!
Faça sua inscrição no link: https://www.exacta.inf.puc-rio.br/exactaspark/
Defesa de Dissertação de Mestrado do aluno Eduardo Roger Silva Nascimento.
Título da dissertação: Querying Databases with Natural Language: The use of Large Language Models for Text-to-SQL tasks.
Resumo: Text-to-SQL involves generating an SQL query based on a given relational database and a natural language question. While the leaderboards of well-known benchmarks indicate that Large Language Models (LLMs) excel in this task, they are evaluated on databases with simpler schemas. This dissertation investigates the performance of LLM-based text-to-SQL models on a complex and openly available database (Mondial) with a larger schema and a set of 100 Natural Language (NL) questions. Running under GPT-3.5 and GPT-4, the results show that LLM-based tools perform significantly less effectively than reported in these benchmarks and struggle with schema linking and joins, suggesting that the relational schema may not be suitable for LLMs. The dissertation proposes using LLM-friendly views and data descriptions for better accuracy in the text-to-SQL task. In the experiment, using the text-to- SQL tool with the best performance and cost from the previous experiment and another set with 100 questions over a real-world database, the results show that the use of LLM-friendly views and data samples, albeit not too difficult to implement, is sufficient to considerably improve the accuracy of the prompt strategy. The dissertation concludes with a discussion of the results obtained and suggests further approaches to simplify the text-to-SQL task.
Orientador: Prof. Dr. Marco Antonio Casanova
Banca: Prof. Dr. Vânia Maria Ponte Vidal | Prof. Dr. Melissa Lemos Cavaliére | Prof. Dr. Luiz André Portes Paes Leme
Assista a defesa pelo link: https://puc-rio.zoom.us/j/93760975741?pwd=YXVNcUQzTTlNa2ZlOVhyd1BhLzkwdz09
Autor: Eduardo Roger Silva Nascimento
Orientador: Marco Antonio Casanova
Data e Hora: 04/04/2024 às 14:00
Local: Videoconferência
Dia 25, às 16h, acontecerá o seminário “The Two Cultures of Artificial Intelligence“, proferido pelo professor Philip Wadler.
Seminário da Pós: “The Two Cultures of Artificial Intelligence”
Resumo do Seminário: Everyone is talking about new advances in Artificial Intelligence (AI): texts written by ChatGPT, images drawn by Midjourney, and self-driving cars from Tesla. When I was a sophomore I learned the fundamentals of my subject from John McCarthy, founder of AI and a pioneer of programming. In the earl days, AI debated the merits of two complementary methods: logic vs heuristics. Typical of the first is proving properties of programs, which became my research interest. Typical of the second is machine learning, the foundation of ChatGPT, Midjourney, and self-driving.
This talk will contrast the two approaches, discussing the benefits and risks of each, and how the first may curb shortcomings of the second.Artists and writers are worried that AI will put them out of a job. One of the next professions on the list is programmers. Already, ChatGPT and related systems can do a credible job of generating simple programs, such as code for web pages. However, also already, such systems have demonstrated that they routinely write code containing known security bugs.
One possible scenario is that heuristic techniques will prove as adequate as humans—and far cheaper—at simple tasks, putting writers, artists, and programmers out of work. Bereft of new data to learn from, the machine learning applications will then fall into stagnation. They will be fine at producing articles, art, and code close to what has been produced before, but unable to produce anything original. And by then there may no longer be writers, artists, or programmers to hire, as who would study for a profession where no one can find work because they’ve been displaced by machines?
A different scenario is to pass laws to ensure that writers and artists are fairly recompensed when AI generates artifacts based on their work. Regarding code, the logical techniques have shown they can vastly improve reliability. Synthesising logical and heuristic techniques may lead to code that is both cheaper and more reliable. Programmers would shift from writing code to writing logical specifications, with AI helping to generate code proved to meet those specifications.
Conheça o Professor: Philip Wadler is Professor of Theoretical Computer Science at the University of Edinburgh and Senior Research Fellow at IOHK. He is a Fellow of the Royal Society, a Fellow of the Royal Society of Edinburgh, and an ACM Fellow. He is head of the steering committee for Proceedings of the ACM, past editor-in-chief of PACMPL and JFP, past chair of ACM SIGPLAN, past holder of a Royal Society-Wolfson Research Merit Fellowship, winner of the SIGPLAN Distinguished Service Award, and a winner of the POPL Most Influential Paper Award. He has an h-index of over 70 with more than 25,000 citations to his work, according to Google Scholar. He contributed to the designs of Java and XQuery, and is co-author of Introduction to Functional Programming (Prentice Hall, 1988), XQuery from the Experts (Addison Wesley, 2004), Generics and Collections in Java (O’Reilly, 2006), and Programming Language Foundations in Agda (2018). He is a principal designer of the Haskell programming language, contributing to its two main innovations, type classes and monads. The YouTube video of his Strange Loop talk Propositions as Types has over 100,000 views.
Para acompanhar o seminário, acesse: https://youtube.com/live/FQI_9Yb6kik
Dia 15 de março acontecerá o seminário “ Bancos de Dados e Redes Sociais Digitais”, proferido pelo professor Sérgio Lifschitz.
Seminário da Pós: “Banco de Dados e Redes Sociais Digitais”
Resumo do Seminário: A comunicação por meio das chamadas Redes Sociais Digitais (ou Online) é parte importante do dia a dia da nossa sociedade. A observação e a análise dos dados nas RSDs reflete, de maneira significativa, o comportamento e o posicionamento das pessoas no cotidiano offline. Apesar das RSDs serem exemplos clássicos de sistemas de Big Data por conta dos grandes volumes de dados e também da velocidade de disseminação dos mesmos, é fato que sistemas de bancos de dados, relacionais ou NoSQL, são pouco ou nunca usados pelos grupos de pesquisa. Motivado por essa constatação, pretendo nessa apresentação mostrar como a grande área de dados (engenharia, ciência e bancos de dados) pode contribuir para investigações científicas e tecnológicas relevantes. A ênfase será dada nos trabalhos já realizados, ou em execução, pelo time de pesquisadores (alunos e colaboradores) do Laboratório BioBD do DI PUC-Rio.
Conheça o Professor: Sérgio Lifschitz é professor do quadro principal do DI e coordenador do Laboratório BioBD na PUC-Rio. Doutor em Informática pela ENST/Télécom Paris, França, com mestrado e graduação em Engenharia Elétrica, ambos pela PUC-Rio. Pesquisador na área de engenharia, ciência e bancos de dados com ênfase em (i) sintonia fina automática, bases e grafos de conhecimento e gestão de redes sociais digitais. Atua também na área de bioinformática, com desenvolvimento de ferramentas em parceria com a Fiocruz e o INCA. É vice-decano de internacionalização do Centro Técnico Científico (CTC) e membro do NDE do Curso de Engenharia de Computação, entre outras atividades tecnico-administrativas.
Para maiores informações sobre o conteúdo e como acompanhá-lo, acesse: https://youtube.com/live/apTTNpDXEBE
Defesa de Dissertação de Mestrado do aluno Matheus Kerber.
Título da dissertação: Fast and Accurate Simulation of Deformable Solid Dynamics on Coarse Meshes
Resumo: This thesis introduces a novel hybrid simulator that combines a numerical Finite Element (FE) Partial Differential Equation solver with a Message Passing Neural Network (MPNN) to perform simulations of deformable solid dynamics on coarse meshes. Our work aims to provide accurate simulations with an error comparable to that obtained with more refined meshes in FE discretizations while maintaining computational efficiency by using an MPNN component that corrects the numerical errors associated with using a coarse mesh. We evaluate our model focusing on accuracy, generalization capacity, and computational speed compared to a reference numerical solver that uses 64 times more refined meshes. We introduce a new dataset for this comparison, encompassing three numerical benchmark cases: (i) free deformation after an initial impulse, (ii) stretching, and (iii) torsion of deformable solids. Based on simulation results, the study thoroughly discusses our methods strengths and weaknesses. The study shows that our method corrects an average of 95.9% of the numerical error associated with discretization while being up to 88 times faster than the reference solver. On top of that, our model is fully differentiable and can be embedded into a neural network layer, allowing it to be easily extended by future work. Our contributions also include demonstrating that our method achieves better results in learning and generalization capacity when compared to a purely data-oriented baseline simulator. Data and code are made available on <github link> for further investigations
Orientador: Prof. Dr. Waldemar Celes Filho
Banca: Prof. Dr. Jose Alberto Rodrigues Pereira Sardinha | Prof. Dr. Ivan Fabio Mota de Menezes | Prof. Dr. Leonardo Seperuelo Duarte
Assista a defesa pelo link: https://puc-rio.zoom.us/j/92665440011?pwd=UnNTR3RwcUNFd1hpVUVoUDJKODdodz09#success