This repository is the official implementation of VL-SAE, which helps users to understand the vision-language alignment of VLMs via concepts. We present the demo of VL-SAE with OpenCLIP and LLaVA 1.5 ...
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V (ision). Python 848 109 ...