CVPR 2024 Day4 AMで気になったpaperを羅列。
後から忘れないようにするためのメモ的立ち位置。
詳しく知りたいものは後日paperを読む予定。
3D
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
https://cvpr.thecvf.com/virtual/2024/poster/29885
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
https://cvpr.thecvf.com/virtual/2024/poster/29266
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
https://cvpr.thecvf.com/virtual/2024/poster/29444
MonoCD: Monocular 3D Object Detection with Complementary Depths
https://cvpr.thecvf.com/virtual/2024/poster/30921
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
https://cvpr.thecvf.com/virtual/2024/poster/29964
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
https://cvpr.thecvf.com/virtual/2024/poster/31680
MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
https://cvpr.thecvf.com/virtual/2024/poster/30683
LaneCPP: Continuous 3D Lane Detection using Physical Priors
https://cvpr.thecvf.com/virtual/2024/poster/30930
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
https://cvpr.thecvf.com/virtual/2024/poster/30607
Gated Fields: Learning Scene Reconstruction from Gated Videos
https://cvpr.thecvf.com/virtual/2024/poster/29275
depth
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
https://cvpr.thecvf.com/virtual/2024/poster/31264
UniDepth: Universal Monocular Metric Depth Estimation
https://cvpr.thecvf.com/virtual/2024/poster/31417
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
https://cvpr.thecvf.com/virtual/2024/poster/30176
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
https://cvpr.thecvf.com/virtual/2024/poster/29435
multimodal
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
https://cvpr.thecvf.com/virtual/2024/poster/29250
LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation
https://cvpr.thecvf.com/virtual/2024/poster/31268
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
https://cvpr.thecvf.com/virtual/2024/poster/29580
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
https://cvpr.thecvf.com/virtual/2024/poster/31575
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
https://cvpr.thecvf.com/virtual/2024/poster/31877
GLaMM: Pixel Grounding Large Multimodal Model
https://cvpr.thecvf.com/virtual/2024/poster/31094
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
https://cvpr.thecvf.com/virtual/2024/poster/31492
Pixel-Aligned Language Model
https://cvpr.thecvf.com/virtual/2024/poster/31639
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
https://cvpr.thecvf.com/virtual/2024/poster/29676
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
https://cvpr.thecvf.com/virtual/2024/poster/31270
See Say and Segment: Teaching LMMs to Overcome False Premises
https://cvpr.thecvf.com/virtual/2024/poster/31231
Segment and Caption Anything
https://cvpr.thecvf.com/virtual/2024/poster/29271
RegionGPT: Towards Region Understanding Vision Language Model
https://cvpr.thecvf.com/virtual/2024/poster/31126
LISA: Reasoning Segmentation via Large Language Model
https://cvpr.thecvf.com/virtual/2024/poster/30109
Taming Self-Training for Open-Vocabulary Object Detection
https://cvpr.thecvf.com/virtual/2024/poster/29999
その他
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge
https://cvpr.thecvf.com/virtual/2024/poster/30824
Long-Tailed Anomaly Detection with Learnable Class Names
https://cvpr.thecvf.com/virtual/2024/poster/31789