@moufuyu2023-09-29[論文読み]BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models