217 |
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities |
not yet |
59 |
Kimi K2: Open Agentic Intelligence |
not yet |
30 |
WebSailor: Navigating Super-human Reasoning for Web Agent |
not yet |
29 |
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning |
not yet |
28 |
MedGemma Technical Report |
not yet |
19 |
Group Sequence Policy Optimization |
not yet |
16 |
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization |
not yet |
15 |
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning |
not yet |
14 |
Rethinking Data Protection in the (Generative) Artificial Intelligence Era |
not yet |
13 |
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety |
 |
12 |
GTA1: GUI Test-time Scaling Agent |
not yet |
11 |
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence |
not yet |
11 |
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy |
not yet |
10 |
Agentic Reinforced Policy Optimization |
not yet |
10 |
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination |
not yet |
10 |
Kwai Keye-VL Technical Report |
not yet |
9 |
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding |
not yet |
9 |
A Survey of Context Engineering for Large Language Models |
not yet |
9 |
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models |
not yet |
9 |
AI4Research: A Survey of Artificial Intelligence for Scientific Research |
not yet |
8 |
Persona Vectors: Monitoring and Controlling Character Traits in Language Models |
 |
8 |
$\pi^3$: Permutation-Equivariant Visual Geometry Learning |
not yet |
8 |
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity |
not yet |
8 |
First Return, Entropy-Eliciting Explore |
not yet |
8 |
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment |
not yet |
7 |
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling |
not yet |
7 |
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 |
 |
7 |
PyVision: Agentic Vision with Dynamic Tooling |
not yet |
7 |
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving |
not yet |
7 |
PaddleOCR 3.0 Technical Report |
not yet |
7 |
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent |
 |
6 |
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data |
 |
6 |
Agentic Web: Weaving the Next Web with AI Agents |
 |
6 |
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset |
not yet |
6 |
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning |
 |
6 |
MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation |
not yet |
6 |
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning |
not yet |
6 |
Inverse Scaling in Test-Time Compute |
not yet |
6 |
Scaling RL to Long Videos |
not yet |
6 |
Unitary designs in nearly optimal depth |
not yet |
6 |
A Survey on Latent Reasoning |
not yet |
6 |
Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models |
not yet |
6 |
Generalizing Verifiable Instruction Following |
not yet |
6 |
Solving the Hubbard model with Neural Quantum States |
not yet |
6 |
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling |
not yet |
6 |
Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction |
not yet |
6 |
Toward Edge General Intelligence with Multiple-Large Language Model (Multi-LLM): Architecture, Trust, and Orchestration |
not yet |
5 |
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again |
not yet |
5 |
Checklists Are Better Than Reward Models For Aligning Language Models |
not yet |
5 |
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains |
not yet |
5 |
Controllable Video Generation: A Survey |
not yet |
5 |
Step-Audio 2 Technical Report |
not yet |
5 |
PromptArmor: Simple yet Effective Prompt Injection Defenses |
not yet |
5 |
Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment |
not yet |
5 |
The Invisible Leash: Why RLVR May Not Escape Its Origin |
not yet |
5 |
Voxtral |
not yet |
5 |
Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation |
not yet |
5 |
Accelerated free energy estimation in ab initio path integral Monte Carlo simulations |
not yet |
5 |
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models |
not yet |
5 |
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST |
not yet |
5 |
Streaming 4D Visual Geometry Transformer |
not yet |
5 |
GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them? |
not yet |
5 |
From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion |
not yet |
5 |
When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors |
not yet |
5 |
MemOS: A Memory OS for AI System |
not yet |
5 |
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards |
not yet |
5 |
MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization |
not yet |
5 |
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory |
not yet |
5 |
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search |
not yet |
5 |
Neural simulation-based inference of the Higgs trilinear self-coupling via off-shell Higgs production |
not yet |
5 |
RoboBrain 2.0 Technical Report |
not yet |
5 |
STELLA: Self-Evolving LLM Agent for Biomedical Research |
not yet |
5 |
Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach |
not yet |
5 |
Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling |
not yet |
5 |
Capsule Network-Based Semantic Intent Modeling for Human-Computer Interaction |
not yet |
4 |
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving |
not yet |
4 |
Magentic-UI: Towards Human-in-the-loop Agentic Systems |
not yet |
4 |
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach |
not yet |
4 |
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding |
not yet |
4 |
On the Energy Distribution of the Galactic Center Excess' Sources |
not yet |
4 |
GR-3 Technical Report |
not yet |
4 |
Solving Formal Math Problems by Decomposition and Iterative Reflection |
not yet |
4 |
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning |
not yet |
4 |
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning |
not yet |
4 |
Autonomous Resource Management in Microservice Systems via Reinforcement Learning |
not yet |
4 |
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos |
not yet |
4 |
The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist |
not yet |
4 |
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models |
not yet |
4 |
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs |
not yet |
4 |
Accelerating Drug Discovery Through Agentic AI: A Multi-Agent Approach to Laboratory Automation in the DMTA Cycle |
not yet |
4 |
Simulating Three-dimensional Turbulence with Physics-informed Neural Networks |
not yet |
4 |
Schoolyard Greening, Child Health, and Neighborhood Change: A Comparative Study of Urban U.S. Cities |
not yet |
4 |
Greening Schoolyards and the Spatial Distribution of Property Values in Denver, Colorado |
not yet |
4 |
Defending Against Prompt Injection With a Few DefensiveTokens |
not yet |
4 |
MIRIX: Multi-Agent Memory System for LLM-Based Agents |
not yet |
4 |
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling |
 |
4 |
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data |
not yet |
4 |
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models |
 |