225 |
DAPO: An Open-Source LLM Reinforcement Learning System at Scale |
 |
190 |
Gemma 3 Technical Report |
 |
142 |
Understanding R1-Zero-Like Training: A Critical Perspective |
 |
123 |
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models |
not yet |
120 |
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models |
not yet |
117 |
Visual-RFT: Visual Reinforcement Fine-Tuning |
not yet |
116 |
Wan: Open and Advanced Large-Scale Video Generative Models |
not yet |
113 |
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models |
not yet |
109 |
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild |
not yet |
107 |
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning |
not yet |
97 |
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning |
not yet |
91 |
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning |
not yet |
89 |
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs |
 |
82 |
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization |
not yet |
77 |
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model |
not yet |
64 |
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization |
not yet |
64 |
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL |
not yet |
64 |
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning |
not yet |
61 |
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond |
not yet |
59 |
Qwen2.5-Omni Technical Report |
not yet |
58 |
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model |
not yet |
57 |
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs |
not yet |
54 |
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots |
not yet |
49 |
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models |
not yet |
47 |
Video-R1: Reinforcing Video Reasoning in MLLMs |
not yet |
45 |
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond |
not yet |
43 |
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning |
not yet |
42 |
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? |
not yet |
42 |
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning |
not yet |
38 |
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement |
not yet |
38 |
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement |
not yet |
36 |
Why Do Multi-Agent LLM Systems Fail? |
 |
36 |
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning |
not yet |
35 |
Gemini Robotics: Bringing AI into the Physical World |
not yet |
32 |
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems |
not yet |
32 |
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching |
not yet |
32 |
An Empirical Study on Eliciting and Improving R1-like Reasoning Models |
not yet |
30 |
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions |
not yet |
30 |
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning |
not yet |
30 |
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models |
not yet |
29 |
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning |
not yet |
29 |
Large Language Model Agent: A Survey on Methodology, Applications and Challenges |
not yet |
29 |
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey |
not yet |
29 |
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach |
not yet |
27 |
Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning |
not yet |
26 |
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation |
 |
23 |
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation |
not yet |
22 |
ToRL: Scaling Tool-Integrated RL |
not yet |
22 |
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret |
not yet |
21 |
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations |
not yet |
20 |
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models |
not yet |
20 |
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad |
 |
20 |
A Linear Collider Vision for the Future of Particle Physics |
not yet |
20 |
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing |
not yet |
20 |
VACE: All-in-One Video Creation and Editing |
not yet |
20 |
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning |
not yet |
19 |
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains |
not yet |
19 |
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models |
 |
19 |
SoK: Security Analysis of Blockchain-based Cryptocurrency |
not yet |
19 |
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks |
not yet |
19 |
RWKV-7 "Goose" with Expressive Dynamic State Evolution |
not yet |
19 |
Efficient Test-Time Scaling via Self-Calibration |
not yet |
18 |
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't |
not yet |
18 |
Measuring AI Ability to Complete Long Tasks |
not yet |
17 |
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models |
not yet |
17 |
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning |
not yet |
17 |
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video |
not yet |
17 |
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model |
not yet |
17 |
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning |
not yet |
17 |
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable |
not yet |
16 |
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning |
not yet |
16 |
What Makes a Reward Model a Good Teacher? An Optimization Perspective |
not yet |
16 |
LLM Agents for Education: Advances and Applications |
not yet |
16 |
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer |
not yet |
16 |
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models |
not yet |
16 |
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens |
not yet |
15 |
Efficient Inference for Large Reasoning Models: A Survey |
not yet |
15 |
Transformers without Normalization |
 |
15 |
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding |
not yet |
15 |
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning |
not yet |
14 |
Open Deep Search: Democratizing Search with Open-source Reasoning Agents |
 |
14 |
A Comprehensive Survey on Long Context Language Modeling |
not yet |
14 |
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning |
not yet |
14 |
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning |
not yet |
14 |
VGGT: Visual Geometry Grounded Transformer |
not yet |
14 |
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning |
not yet |
14 |
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model |
not yet |
14 |
START: Self-taught Reasoner with Tools |
not yet |
14 |
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test |
not yet |
13 |
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL |
not yet |
13 |
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking |
not yet |
13 |
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training |
not yet |
13 |
XAttention: Block Sparse Attention with Antidiagonal Scoring |
not yet |
13 |
A Survey on Trustworthy LLM Agents: Threats and Countermeasures |
not yet |
13 |
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k |
not yet |
13 |
YuE: Scaling Open Foundation Models for Long-Form Music Generation |
not yet |
13 |
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models |
not yet |
13 |
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning |
not yet |
13 |
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities |
not yet |
13 |
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents |
not yet |
12 |
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness |
not yet |
12 |
Reasoning to Learn from Latent Thoughts |
 |
12 |
Defeating Prompt Injections by Design |
not yet |
12 |
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse |
not yet |
12 |
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models |
not yet |
12 |
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability |
not yet |
12 |
Unified Reward Model for Multimodal Understanding and Generation |
not yet |
12 |
Personalized Generation In Large Model Era: A Survey |
not yet |
12 |
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models |
not yet |
11 |
SCORE: Story Coherence and Retrieval Enhancement for AI Narratives |
not yet |
11 |
Agentic Large Language Models, a survey |
not yet |
11 |
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving |
not yet |
11 |
Learning Multi-Level Features with Matryoshka Sparse Autoencoders |
not yet |
11 |
Survey on Evaluation of LLM-based Agents |
not yet |
11 |
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation |
not yet |
11 |
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning |
not yet |
11 |
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models |
not yet |
11 |
Long Context Tuning for Video Generation |
not yet |
11 |
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models |
 |
11 |
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions |
not yet |
11 |
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful |
not yet |
11 |
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning |
not yet |
11 |
Remasking Discrete Diffusion Models with Inference-Time Scaling |
not yet |
10 |
Effectively Controlling Reasoning Models through Thinking Intervention |
not yet |
10 |
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models |
not yet |
10 |
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation |
not yet |
10 |
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback |
 |
10 |
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks |
not yet |
10 |
Social Network User Profiling for Anomaly Detection Based on Graph Neural Networks |
not yet |
10 |
AgentRxiv: Towards Collaborative Autonomous Research |
not yet |
10 |
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback |
not yet |
10 |
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings |
not yet |
10 |
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding |
not yet |
10 |
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation |
not yet |
10 |
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation |
not yet |
10 |
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning |
not yet |
10 |
SafeArena: Evaluating the Safety of Autonomous Web Agents |
 |
10 |
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction |
not yet |
10 |
A New $\sim 5\sigma$ Tension at Characteristic Redshift from DESI-DR1 BAO and DES-SN5YR Observations |
not yet |
9 |
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? |
not yet |
9 |
Large Language Models Pass the Turing Test |
not yet |
9 |
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning |
not yet |
9 |
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework |
not yet |
9 |
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models |
not yet |
9 |
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition |
 |
9 |
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging |
not yet |
9 |
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? |
not yet |
9 |
HALHF: a hybrid, asymmetric, linear Higgs factory using plasma- and RF-based acceleration |
not yet |
9 |
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation |
not yet |