Master thesis

Visual Foundation Models for Model-based RL


The Machine Learning Research Lab (MLRL) is looking for a master student (d/f/m) in the domain of machine learning, control and robotics in Munich. The MLRL is part of Volkswagen Group IT and tackles fundamental research in machine learning and optimal control. We develop new methods for generative time series modelling and control of dynamical systems. In a final step, these algorithms are tested on real systems, e.g. robot arms or mobile robots. For this purpose a robot lab with a variety of robotic systems, motion capture systems, a diverse set of sensors and so forth is available.


In recent years, the pre-training and fine-tuning pipeline has become dominant in the NLP and CV fields. These models are typically pre-trained on massive amounts of data and are referred to as foundation models. They have proven to be extremely powerful for few-shot or zero-shot adaptation on downstream tasks. However, their role in control tasks has been less studied.

Recently, there are a few papers demonstrating the effectiveness of visual foundation models on imitation learning and model-free reinforcement learning. However, the other more potential method, i.e. model-based reinforcement learning has not been explored. Therefore, this thesis aims to explore how to leverage pre-trained visual models on visuomotor tasks with model-based reinforcement learning. Specifically, we seek to answer the following questions:

  • What are the benefits that visual foundation models provide for control tasks?
  • What is the best way to use visual foundation models for control tasks with model-based reinforcement learning?


  • Study the recent progress in visual foundation models and its rule in control tasks.
  • Design and try out different ways of incorporating visual foundation models into the model-based RL components.
  • Benchmarking the representative visual foundation models on the proposed methods.
  • Test the sim-to-real ability of the policy trained with foundation models.
  • Analysing the results to get valuable insights.


  • Student (master) in natural sciences or engineering disciplines;
  • Interest in machine learning, computer vision, reinforcement learning and robotics. Familiar with visual foundation models and model-based RL is a plus;
  • Very good knowledge of the Python programming language. Experience with PyTorch and package of foundation models (e.g. timm, transformers) is a plus;
  • Very good knowledge of numerical optimisation, probability theory, information theory, calculus and linear algebra;
  • Good knowledge of tools such as git, LaTeX and any Unix shell;
  • Self-motivated working;
  • Proficient in English.​

Please contact us at arg-min @ if you are interested. Mention "Master Thesis - VFM" in your email subject for this particular opening.

Announcement at Volkswagen Stellenbörse