295 |
Qwen3 Technical Report |
not yet |
44 |
Seed1.5-VL Technical Report |
not yet |
39 |
Llama-Nemotron: Efficient Reasoning Models |
not yet |
32 |
Absolute Zero: Reinforced Self-play Reasoning with Zero Data |
 |
29 |
Reasoning Models Don't Always Say What They Think |
 |
29 |
RM-R1: Reward Modeling as Reasoning |
not yet |
26 |
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT |
not yet |
24 |
Emerging Properties in Unified Multimodal Pretraining |
not yet |
22 |
LLMs Get Lost In Multi-Turn Conversation |
not yet |
19 |
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset |
not yet |
18 |
Learning to Reason without External Rewards |
not yet |
18 |
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models |
not yet |
17 |
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges |
not yet |
17 |
ZeroSearch: Incentivize the Search Capability of LLMs without Searching |
 |
16 |
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning |
not yet |
15 |
Skywork Open Reasoner 1 Technical Report |
not yet |
15 |
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs |
not yet |
14 |
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training |
not yet |
14 |
HealthBench: Evaluating Large Language Models Towards Improved Human Health |
not yet |
14 |
A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP) |
not yet |
14 |
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models |
not yet |
13 |
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models |
 |
13 |
Future Circular Collider Feasibility Study Report: Volume 1, Physics, Experiments, Detectors |
not yet |
12 |
MMaDA: Multimodal Large Diffusion Language Models |
 |
12 |
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process |
not yet |
12 |
The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models |
not yet |
11 |
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models |
not yet |
11 |
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning |
not yet |
11 |
AdaptThink: Reasoning Models Can Learn When to Think |
not yet |
11 |
ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems |
 |
11 |
Scalable Chain of Thoughts via Elastic Reasoning |
not yet |
11 |
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation |
not yet |
11 |
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models |
not yet |
11 |
Practical Efficiency of Muon for Pretraining |
not yet |
11 |
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions |
not yet |
10 |
Avocado Price Prediction Using a Hybrid Deep Learning Model: TCN-MLP-Attention Architecture |
not yet |
10 |
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining |
not yet |
10 |
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains |
not yet |
10 |
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning |
not yet |
10 |
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions |
not yet |
10 |
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning |
not yet |
9 |
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning |
not yet |
9 |
Think Only When You Need with Large Hybrid-Reasoning Models |
not yet |
9 |
Aya Vision: Advancing the Frontier of Multilingual Multimodality |
not yet |
9 |
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation |
not yet |
9 |
LlamaFirewall: An open source guardrail system for building secure AI agents |
not yet |
9 |
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models |
not yet |
9 |
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning |
not yet |
8 |
Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data |
not yet |
8 |
SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition |
not yet |
8 |
General-Reasoner: Advancing LLM Reasoning Across All Domains |
not yet |
8 |
Thinkless: LLM Learns When to Think |
not yet |
8 |
CTLformer: A Hybrid Denoising Model Combining Convolutional Layers and Self-Attention for Enhanced CT Image Reconstruction |
not yet |
8 |
Group-in-Group Policy Optimization for LLM Agent Training |
not yet |
8 |
DanceGRPO: Unleashing GRPO on Visual Generation |
not yet |
8 |
Crosslingual Reasoning through Test-Time Scaling |
not yet |
8 |
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data |
not yet |
8 |
TWIST: Teleoperated Whole-Body Imitation System |
not yet |
7 |
One-shot Entropy Minimization |
not yet |
7 |
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models |
not yet |
7 |
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting |
not yet |
7 |
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? |
not yet |
7 |
Reasoning Models Better Express Their Confidence |
not yet |
7 |
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning |
not yet |
7 |
Robin: A multi-agent system for automating scientific discovery |
not yet |
7 |
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning |
not yet |
7 |
Cloud-Based AI Systems: Leveraging Large Language Models for Intelligent Fault Detection and Autonomous Self-Healing |
not yet |
7 |
Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization |
not yet |
7 |
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning |
not yet |
7 |
Generative AI for Autonomous Driving: Frontiers and Opportunities |
not yet |
7 |
User Behavior Analysis in Privacy Protection with Large Language Models: A Study on Privacy Preferences with Limited Data |
not yet |
7 |
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions |
not yet |
7 |
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging |
not yet |
7 |
Personalized Risks and Regulatory Strategies of Large Language Models in Digital Advertising |
not yet |
7 |
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models |
not yet |
7 |
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards |
not yet |
7 |
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation |
not yet |
7 |
On the generalization of language models from in-context learning and finetuning: a controlled study |
 |
7 |
PDCS: A Primal-Dual Large-Scale Conic Programming Solver with GPU Enhancements |
not yet |
7 |
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems |
not yet |
6 |
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration |
not yet |
6 |
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents |
 |
6 |
Can Large Reasoning Models Self-Train? |
not yet |
6 |
SageAttention2++: A More Efficient Implementation of SageAttention2 |
not yet |
6 |
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution |
not yet |
6 |
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning |
not yet |
6 |
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO |
not yet |
6 |
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing |
not yet |
6 |
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL |
not yet |
6 |
Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement |
not yet |
6 |
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models |
not yet |
6 |
Flow-GRPO: Training Flow Matching Models via Online RL |
not yet |
6 |
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation |
not yet |
6 |
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges |
not yet |
6 |
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation |
not yet |
6 |
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning |
not yet |
6 |
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents |
not yet |
6 |
LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey |
not yet |
6 |
Sum Rate Maximization for NOMA-Assisted Uplink Pinching-Antenna Systems |
not yet |
6 |
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation |
not yet |
6 |
Base Models Beat Aligned Models at Randomness and Creativity |
not yet |
5 |
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning |
 |
5 |
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation |
not yet |
5 |
Large Language Models Often Know When They Are Being Evaluated |
not yet |
5 |
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better |
not yet |
5 |
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models |
not yet |
5 |
Dissipative Preparation of Many-Body Quantum States: Towards Practical Quantum Advantage |
not yet |
5 |
ImgEdit: A Unified Image Editing Dataset and Benchmark |
not yet |
5 |
Research on feature fusion and multimodal patent text based on graph attention network |
not yet |
5 |
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters |
not yet |
5 |
Convergence Analysis of Adaptive Finite Element Algorithms for a Regularized Variational Model of Quasi-Static Brittle Fracture in "Strain-Limiting" Elastic Solids |
not yet |
5 |
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond |
not yet |
5 |
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation |
not yet |
5 |
Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO$_2$ Storage |
not yet |
5 |
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning |
not yet |
5 |
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning |
not yet |
5 |
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning |
not yet |
5 |
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey |
not yet |
5 |
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities |
not yet |
5 |
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought |
not yet |
5 |
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning |
not yet |
5 |
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning |
not yet |
5 |
Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens |
not yet |
5 |
Mean Flows for One-step Generative Modeling |
not yet |
5 |
Optimizing Anytime Reasoning via Budget Relative Policy Optimization |
not yet |
5 |
Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs |
not yet |
5 |
Harnessing the Universal Geometry of Embeddings |
not yet |
5 |
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation |
not yet |
5 |
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning |
not yet |
5 |
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning |
not yet |
5 |
Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery and Interoperability |
not yet |
5 |
Large Language Models Are More Persuasive Than Incentivized Human Persuaders |
not yet |
5 |
Energy-Efficient Resource Allocation for NOMA-Assisted Uplink Pinching-Antenna Systems |
not yet |