0

【2024年10月公開 Arxiv論文ランキング】2410.xxxxx

Last updated at 2025-04-04Posted at 2024-12-11

AI論文解説 Youtubeチャンネル: AI時代の羅針盤

2024年10月頃に公開されたcsカテゴリの論文 (ID: 2410.xxxxx)を被引用数のデータを元にランキングしています。ランキングは随時更新します。
(2025年4月4日更新)

被引用数	タイトル	動画
604	GPT-4o System Card	not yet
155	Movie Gen: A Cast of Media Foundation Models
134	GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
93	$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control	not yet
91	Video Instruction Tuning With Synthetic Data	not yet
82	Pixtral 12B	not yet
56	Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation	not yet
53	O1 Replication Journey: A Strategic Progress Report -- Part 1	not yet
53	RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	not yet
52	Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
52	Moshi: a speech-text foundation model for real-time dialogue	not yet
46	Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs	not yet
46	Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	not yet
46	MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion	not yet
45	YOLOv11: An Overview of the Key Architectural Enhancements	not yet
45	Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning	not yet
43	GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation	not yet
43	Pyramidal Flow Matching for Efficient Video Generative Modeling
41	LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning	not yet
40	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
39	Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think	not yet
38	Aria: An Open Multimodal Native Mixture-of-Experts Model	not yet
37	OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models	not yet
36	SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference	not yet
36	OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data	not yet
35	Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge	not yet
35	How to Train Long-Context Language Models (Effectively)	not yet
34	Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
33	MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
32	Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models	not yet
30	LLaVA-Critic: Learning to Evaluate Multimodal Models	not yet
29	ALOHA Unleashed: A Simple Recipe for Robot Dexterity	not yet
29	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
29	SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers	not yet
28	HART: Efficient Visual Generation with Hybrid Autoregressive Transformer	not yet
28	Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models	not yet
27	CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos	not yet
27	Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities	not yet
27	AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents	not yet
27	Differential Transformer
26	Orb: A Fast, Scalable Neural Network Potential	not yet
26	Data Scaling Laws in Imitation Learning for Robotic Manipulation	not yet
26	PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction	not yet
26	F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
25	RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
25	HelpSteer2-Preference: Complementing Ratings with Preferences	not yet
24	Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models	not yet
23	Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
23	Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations	not yet
23	Loong: Generating Minute-level Long Videos with Autoregressive Language Models	not yet
23	HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly	not yet
22	Generalizable Humanoid Manipulation with 3D Diffusion Policies	not yet
22	HSR-Enhanced Sparse Attention Acceleration	not yet
22	Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	not yet
22	A Survey on Diffusion Models for Inverse Problems	not yet
21	Liger Kernel: Efficient Triton Kernels for LLM Training	not yet
21	Agent-as-a-Judge: Evaluate Agents with Agents
21	AFlow: Automating Agentic Workflow Generation	not yet
21	Looped ReLU MLPs May Be All You Need as Practical Programmable Computers	not yet
21	Baichuan-Omni Technical Report	not yet
20	DepthSplat: Connecting Gaussian Splatting and Depth	not yet
20	A Comparative Study on Reasoning Patterns of OpenAI's o1 Model	not yet
20	LightRAG: Simple and Fast Retrieval-Augmented Generation	not yet
20	ImageFolder: Autoregressive Image Generation with Folded Tokens	not yet
19	Improve Vision Language Model Chain-of-thought Reasoning	not yet
19	Allegro: Open the Black Box of Commercial-Level Video Generation Model
19	Generative Reward Models	not yet
19	JudgeBench: A Benchmark for Evaluating LLM-based Judges	not yet
19	VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents	not yet
19	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	not yet
19	VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks	not yet
19	Strong Model Collapse
19	CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL	not yet
18	No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images	not yet
18	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	not yet
18	MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark	not yet
18	Performance of the CMS high-level trigger during LHC Run 2	not yet
18	A Survey on Data Synthesis and Augmentation for Large Language Models	not yet
18	TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	not yet
18	Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes	not yet
18	DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation	not yet
18	Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
17	EMMA: End-to-End Multimodal Model for Autonomous Driving	not yet
17	Automatically Interpreting Millions of Features in Large Language Models	not yet
17	Latent Action Pretraining from Videos
17	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	not yet
17	When Attention Sink Emerges in Language Models: An Empirical View	not yet
17	Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis	not yet
17	ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery	not yet
17	SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?	not yet
17	AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark	not yet
17	ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI	not yet
16	VoiceBench: Benchmarking LLM-Based Voice Assistants	not yet
16	TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	not yet
16	DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation	not yet
16	MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models	not yet
16	How to Construct Random Unitaries	not yet
16	Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	not yet
16	Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	not yet
16	Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG	not yet
16	Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise	not yet
15	Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
15	Jailbreaking and Mitigation of Vulnerabilities in Large Language Models	not yet
15	Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent	not yet
15	Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues	not yet
15	Impurities and polarons in bosonic quantum gases: a review on recent progress	not yet
15	Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow	not yet
15	T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design	not yet
15	Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification	not yet
15	AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs	not yet
15	ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection	not yet
15	Inference Scaling for Long-Context Retrieval Augmented Generation
15	LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
15	AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models	not yet
15	VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	not yet
15	Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown	not yet
14	In-Context LoRA for Diffusion Transformers	not yet
14	Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?	not yet
14	OS-ATLAS: A Foundation Action Model for Generalist GUI Agents	not yet
14	HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots	not yet
14	Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages	not yet
14	Jailbreaking LLM-Controlled Robots	not yet
14	SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs	not yet
14	The Ingredients for Robotic Diffusion Transformers	not yet
14	ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback	not yet
14	Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation	not yet
14	Falcon Mamba: The First Competitive Attention-free 7B Language Model	not yet
14	TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens	not yet
14	CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs	not yet
14	Interpretable Contrastive Monte Carlo Tree Search Reasoning	not yet
13	SelfCodeAlign: Self-Alignment for Code Generation
13	MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision	not yet
13	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	not yet
13	WorldSimBench: Towards Video Generation Models as World Simulators	not yet
13	The XLZD Design Book: Towards the Next-Generation Liquid Xenon Observatory for Dark Matter and Neutrino Physics	not yet
13	Thinking LLMs: General Instruction Following with Thought Generation
13	Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System	not yet
13	Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents	not yet
13	IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers	not yet
12	On Memorization of Large Language Models in Logical Reasoning	not yet
12	CaloChallenge 2022: A Community Challenge for Fast Calorimeter Simulation	not yet
12	Safety cases for frontier AI	not yet
12	MarDini: Masked Autoregressive Diffusion for Video Generation at Scale	not yet
12	SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models	not yet
12	Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data	not yet
12	Scaling Diffusion Language Models via Adaptation from Autoregressive Models	not yet
12	Self-Supervised Graph Neural Networks for Enhanced Feature Extraction in Heterogeneous Information Networks	not yet
12	Efficient and Aesthetic UI Design with a Deep Learning-Based Interface Generation Tree Algorithm	not yet
12	RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style	not yet
12	REEF: Representation Encoding Fingerprints for Large Language Models	not yet
12	DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control	not yet
12	Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats	not yet
12	SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation	not yet
12	G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	not yet
12	LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory	not yet
12	IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation	not yet
12	Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models	not yet
12	Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning	not yet
12	Learning How Hard to Think: Input-Adaptive Allocation of LM Computation	not yet
12	Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models	not yet
12	FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models	not yet
12	Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems	not yet
12	Were RNNs All We Needed?
11	One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation	not yet
11	AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
11	Fast Best-of-N Decoding via Speculative Rejection	not yet
11	Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing	not yet
11	Why Does the Effective Context Length of LLMs Fall Short?	not yet
11	One-Step Diffusion Distillation through Score Implicit Matching	not yet
11	Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance	not yet
11	CamI2V: Camera-Controlled Image-to-Video Diffusion Model	not yet
11	A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining	not yet
11	From PINNs to PIKANs: Recent Advances in Physics-Informed Machine Learning	not yet
11	Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents	not yet
11	Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws	not yet
11	Mechanistic?	not yet
11	Losing dimensions: Geometric memorization in generative diffusion	not yet

※ 被引用数は更新日における NASA ADSのデータを参照しています
https://ui.adsabs.harvard.edu/

0

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

0