Gallery

LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes

Recently, I worked on transferring representations from DINOv2, SAM, and CLIP into 3D Gaussian Splatting scenes. My study showed that a simple aggregation of 2D features is highly effective, achieving competitive results on segmentation and detection tasks while providing significant speed-ups over prior methods minimizing a reprojection loss. For 3D segmentation, we introduce a graph diffusion mechanism that enriches 3D features, such as coarse segmentation masks, by leveraging 3D geometry and pairwise similarities induced by DINOv2.

Illustration of the inverse and forward rendering between 2D visual features (produced by DINOv2) and a 3D Gaussian Splatting scene. In the inverse rendering (or uplifting) phase, features are created for each 3D Gaussian by aggregating coarse 2D features over all viewing directions. For forward rendering, the 3D features are projected on any given viewing direction as in regular Gaussian Splatting.

3D graph diffusion for foreground/background segmentation. The 3D mask spreads to neighboring Gaussians with similar DINOv2 features.

3D DINOv2 features (left), CLIP features (middle) and CLIP relevancy with text prompts (right).

On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

My second PhD project (TMLR 2024) delineates good practices for leveraging large pretrained visual models to train smaller models on specific tasks.

PCA of image features for 30 classes of the CUB Bird dataset. Distilling a large pretrained teacher (top, left) to train a small task-specific student model (top, right) results in a better clustering of the representations compared to simply finetuning the student on the task (bottom, right). Distillation can be improved by using a Mixup-inspired class-agnostic data augmentation based on Stable Diffusion (grey features in teacher plot).

SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

My first PhD project (CVPR 2023) focused on automatically learning optimal data augmentation policies using bilevel optimization.

The three most likely and least likely augmentations for different domains, as estimated by SLACK - a method for automatically learning optimal data augmentation policies.

Self-supervised learning from 3D medical images (Philips Research, Paris)

In 2021, I completed a 6-month internship at Philips Research (France), where I worked on self-supervised learning from 3D medical images, with evaluation on 3D ultrasound image segmentation and 3D CT scan classification.

Heart segmentation in a 3D ultrasound image using a pretrained model

Phylogenetic tree reconstruction (Weill Cornell, New York)

In 2020, I completed a 6-month internship in the Landau Lab (Weill Cornell Medicine and the NYGC, New York), where I worked on reconstructing phylogenetic trees (of cell divisions) based on mutations observed in microsatellite sequences obtained through single-cell RNA sequencing.

Inferred phylogenetic tree, colored by the three main cell types.

Shape optimization with Geometric Deep Learning (Neural Concept, Lausanne)

In 2019, I completed a 6-month internship at Neural Concept (start-up in EPFL, Lausanne), where I worked on optimizing 3D shapes (e.g. with respect to the lift-to-drag ratio) using a Geometric Deep Learning model trained on the outputs from physical simulations.