Master Thesis - RAG-Powered LLM Chatbot
The UFZ
The Helmholtz Centre for Environmental Research (UFZ) with its 1,100 employees has gained an excellent reputation as an international competence centre for environmental sciences.
We are part of the largest scientific organisation in Germany, the Helmholtz association. Our mission : Our research seeks to find a balance between social development and the long-term protection of our natural resources.
The job
Research groups typically consist of researchers with varying expertise levels, technicians, and students. Research institutions encompass numerous such groups across different domains, supported by additional human resources, resource management, and project oversight colleagues.
As these institutions grow or the area of focus needs to shift or adjust, strategic planning and the efficient allocation of resources and competencies become increasingly complex.
Questions like, "Who is working on what, with whom?" or "Who possesses which skills, and which methods, technologies, or devices are being used?
become central to strategic processes such as SWOT analyses. Experience shows that obtaining a comprehensive, bird's-eye perspective on these elements is challenging, making it difficult to formulate effective strategies.
Project title : If we knew what we know
The mission of this project is to create an intuitive, data-driven platform that maps employees' skills, expertise, and work methodologies, enabling our research unit, Chemicals in the Environment at the Helmholtz Centre for Environmental Research, to optimize strategic planning resource allocation, foster cross-functional collaboration, and enhance knowledge sharing for greater productivity and innovation.
The overall vision of this project is to empower our research unit with intelligent insights into our workforce, fostering data-driven strategy planning, optimized collaboration, and innovation by unlocking the full potential of employee expertise and knowledge.
The position to prepare the Master's thesis is limited to 6 months and will be supervised at the site in Leipzig.
Your tasks
In this Master's thesis, a ChatBot Python package will be developed. Large language models (LLMs) will be used and grounded with a knowledge graph of information on the researchers of our research unit, their roles, and their research, technologies, and devices.
Retrieval augmented generation (RAG) will mitigate knowledge gaps, factuality issues, and hallucinations of (ungrounded) LLMs with external / domain-specific knowledge.
The ChatBot will serve as a user-friendly, human-like interface for non-computer scientists involved in strategic processes, making it easier for them to access and understand these data.
The tasks include :
- Develop a ChatBot Python package (utilizing an in-house existing prototype) that is based on Large Language models
- Develop a retrieval augmented generation solution to ground this LLM with knowledge stored in a Graph Database
- Sophisically engineer prompts to allow correct responses to the project relevant questions
- Provide visuals for show cases
We offer
- Excellent supervision that supports your personal and professional development
- Exciting insights into the work of a leading research institute
- The chance to work in interdisciplinary, international teams and benefit from a wide range of perspectives
- The opportunity to contribute and actively shape your own ideas and impulses
right from the start
Modern technical equipment and IT service to optimally support your work
Your profile
- Background in Computer Science
- Advanced understanding of large language model (LLMs) APIs
- Solid programming skills in Python
- Experience with software development in an IDE (JetBrains PyCharm)
- Experience with collaborative software development and agile project management with Git
- Database experience and database querying languages, preferably with graph databases and CYPHER
- Fluent in spoken and written English