187 |
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models |
not yet |
151 |
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? |
 |
113 |
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model |
not yet |
69 |
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks |
not yet |
66 |
Kimi-VL Technical Report |
not yet |
63 |
Reinforcement Learning for Reasoning in Large Language Models with One Training Example |
 |
61 |
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization |
 |
57 |
Reasoning Models Can Be Effective Without Thinking |
 |
52 |
Inference-Time Scaling for Generalist Reward Modeling |
not yet |
50 |
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments |
not yet |
49 |
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning |
not yet |
47 |
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning |
not yet |
46 |
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning |
not yet |
46 |
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems |
not yet |
44 |
TTRL: Test-Time Reinforcement Learning |
 |
40 |
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning |
not yet |
40 |
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs |
 |
38 |
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition |
 |
38 |
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models |
not yet |
38 |
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents |
not yet |
36 |
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning |
not yet |
35 |
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment |
not yet |
35 |
ToolRL: Reward is All Tool Learning Needs |
not yet |
34 |
Phi-4-reasoning Technical Report |
 |
33 |
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search |
 |
31 |
Step1X-Edit: A Practical Framework for General Image Editing |
not yet |
30 |
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce |
not yet |
28 |
Dynamic Early Exit in Reasoning Models |
not yet |
28 |
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought |
not yet |
28 |
PaperBench: Evaluating AI's Ability to Replicate AI Research |
not yet |
27 |
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility |
not yet |
27 |
Concise Reasoning via Reinforcement Learning |
not yet |
27 |
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation |
not yet |
27 |
Command A: An Enterprise-Ready Large Language Model |
 |
26 |
Kimi-Audio Technical Report |
not yet |
26 |
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents |
not yet |
25 |
DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning |
not yet |
24 |
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning |
not yet |
24 |
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems |
not yet |
24 |
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model |
not yet |
23 |
WebThinker: Empowering Large Reasoning Models with Deep Research Capability |
not yet |
23 |
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset |
not yet |
23 |
Efficient Reasoning Models: A Survey |
not yet |
23 |
Transfer between Modalities with MetaQueries |
not yet |
21 |
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning |
not yet |
21 |
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement |
not yet |
21 |
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining |
not yet |
21 |
Rethinking Reflection in Pre-Training |
not yet |
20 |
Learning to Reason under Off-Policy Guidance |
not yet |
20 |
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM |
not yet |
20 |
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL |
not yet |
20 |
Perception-R1: Pioneering Perception Policy with Reinforcement Learning |
not yet |
20 |
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving |
not yet |
20 |
LLM Social Simulations Are a Promising Research Method |
not yet |
20 |
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding |
not yet |
20 |
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents |
not yet |
19 |
Acting Less is Reasoning More! Teaching Model to Act Efficiently |
 |
19 |
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization |
not yet |
19 |
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use |
not yet |
18 |
ReasonIR: Training Retrievers for Reasoning Tasks |
not yet |
18 |
Building A Secure Agentic AI Application Leveraging A2A Protocol |
 |
18 |
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners |
not yet |
18 |
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation |
not yet |
18 |
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? |
not yet |
18 |
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification |
not yet |
18 |
SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode |
not yet |
18 |
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning |
not yet |
17 |
Safety in Large Reasoning Models: A Survey |
not yet |
17 |
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models |
not yet |
17 |
Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies |
not yet |
17 |
SmolVLM: Redefining small and efficient multimodal models |
not yet |
17 |
Enhancing Smart Contract Vulnerability Detection in DApps Leveraging Fine-Tuned LLM |
not yet |
17 |
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation |
not yet |
17 |
Z1: Efficient Test-time Scaling with Code |
not yet |
16 |
The Leaderboard Illusion |
 |
16 |
A Survey of AI Agent Protocols |
not yet |
16 |
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning |
not yet |
16 |
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning |
not yet |
16 |
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning |
not yet |
16 |
Seedream 3.0 Technical Report |
not yet |
16 |
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models |
not yet |
16 |
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning |
not yet |
16 |
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning |
not yet |
16 |
On The Landscape of Spoken Language Models: A Comprehensive Survey |
not yet |
16 |
SEAL: Steerable Reasoning Calibration of Large Language Models for Free |
not yet |
16 |
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use |
not yet |
16 |
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning |
not yet |
16 |
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models |
not yet |
16 |
An Approach to Technical AGI Safety and Security |
not yet |
16 |
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs |
not yet |
16 |
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models |
not yet |
15 |
One-Minute Video Generation with Test-Time Training |
 |
15 |
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding |
not yet |
15 |
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning |
not yet |
14 |
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer |
not yet |
14 |
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning |
not yet |
14 |
Optimized Path Planning for Logistics Robots Using Ant Colony Algorithm under Multiple Constraints |
not yet |
14 |
An Illusion of Progress? Assessing the Current State of Web Agents |
not yet |
14 |
WorldScore: A Unified Evaluation Benchmark for World Generation |
not yet |
14 |
JudgeLRM: Large Reasoning Models as a Judge |
not yet |
13 |
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory |
not yet |
13 |
Fast-Slow Thinking for Large Vision-Language Model Reasoning |
not yet |
13 |
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models |
not yet |
13 |
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning |
not yet |
13 |
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation |
not yet |
13 |
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections |
not yet |
13 |
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft |
not yet |
13 |
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning |
not yet |
13 |
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data |
not yet |
12 |
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization |
not yet |
12 |
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review |
not yet |
12 |
Malicious Code Detection in Smart Contracts via Opcode Vectorization |
not yet |
12 |
WORLDMEM: Long-term Consistent World Simulation with Memory |
not yet |
12 |
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning |
not yet |
12 |
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability |
not yet |
12 |
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models |
not yet |
12 |
TxGemma: Efficient and Agentic LLMs for Therapeutics |
not yet |
12 |
Understanding Aha Moments: from External Observations to Internal Mechanisms |
not yet |
12 |
Why do LLMs attend to the first token? |
not yet |
12 |
SkyReels-A2: Compose Anything in Video Diffusion Transformers |
not yet |
11 |
SWE-smith: Scaling Data for Software Engineering Agents |
 |
11 |
TesserAct: Learning 4D Embodied World Models |
not yet |
11 |
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods |
not yet |
11 |
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future |
not yet |
11 |
TextArena |
not yet |
11 |
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge |
not yet |
11 |
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users |
not yet |
11 |
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning |
not yet |
11 |
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills |
not yet |
11 |
Leanabell-Prover: Posttraining Scaling in Formal Reasoning |
not yet |
11 |
Think When You Need: Self-Adaptive Chain-of-Thought Learning |
not yet |
11 |
Improved Visual-Spatial Reasoning via R1-Zero-Like Training |
not yet |
10 |
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning |
not yet |
10 |
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks |
not yet |
10 |
HalluLens: LLM Hallucination Benchmark |
not yet |
10 |
DreamO: A Unified Framework for Image Customization |
not yet |
10 |
Describe Anything: Detailed Localized Image and Video Captioning |
 |
10 |
SConU: Selective Conformal Uncertainty in Large Language Models |
not yet |
10 |
IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design |
not yet |
10 |
SkyReels-V2: Infinite-length Film Generative Model |
not yet |
10 |
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time |
not yet |
10 |
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers |
not yet |
10 |
Psychological Health Knowledge-Enhanced LLM-based Social Network Crisis Intervention Text Transfer Recognition Method |
not yet |
10 |
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models |
not yet |
10 |
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning |
not yet |
10 |
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay |
not yet |
10 |
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme |
not yet |
10 |
Cognitive Memory in Large Language Models |
not yet |
10 |
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead |
 |
9 |
Multidimensional precipitation index prediction based on CNN-LSTM hybrid framework |
not yet |
9 |
Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach |
not yet |
9 |
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese |
not yet |
9 |
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks |
not yet |
9 |
Process Reward Models That Think |
not yet |
9 |
Tina: Tiny Reasoning Models via LoRA |
not yet |