2

【2025年1月公開 Arxiv論文ランキング】2501.xxxxx

Posted at 2025-04-04

AI論文解説 Youtubeチャンネル: AI時代の羅針盤

2025年1月頃に公開されたcsカテゴリの論文 (ID: 2501.xxxxx)を被引用数のデータを元にランキングしています。ランキングは随時更新します。
(2025年4月4日更新)

被引用数	タイトル	動画
821	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	not yet
85	s1: Simple test-time scaling	not yet
78	Kimi k1.5: Scaling Reinforcement Learning with LLMs	not yet
67	rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
44	Cosmos World Foundation Model Platform for Physical AI	not yet
39	2 OLMo 2 Furious	not yet
36	The Lessons of Developing Process Reward Models in Mathematical Reasoning
34	Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models	not yet
32	Humanity's Last Exam
31	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	not yet
27	LTX-Video: Realtime Video Latent Diffusion	not yet
26	MiniMax-01: Scaling Foundation Models with Lightning Attention	not yet
25	Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought	not yet
24	SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
22	VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction	not yet
22	Titans: Learning to Memorize at Test Time
19	Search-o1: Agentic Search-Enhanced Large Reasoning Models
17	Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
17	LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs	not yet
17	REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models	not yet
16	On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis	not yet
16	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	not yet
14	Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
14	A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models	not yet
14	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	not yet
14	Evolving Deeper LLM Thinking
14	Open Problems in Machine Unlearning for AI Safety
14	Agent Laboratory: Using LLM Agents as Research Assistants
13	On the Computational Capability of Graph Neural Networks: A Circuit Complexity Bound Perspective	not yet
12	Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling	not yet
12	Neural Algorithmic Reasoning for Hypergraphs with Looped Transformers	not yet
12	FAST: Efficient Action Tokenization for Vision-Language-Action Models	not yet
12	Do generative video models understand physical principles?
12	LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token	not yet
12	Virgo: A Preliminary Exploration on Reproducing o1-like MLLM	not yet
12	Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models	not yet
11	O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning	not yet
11	PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models	not yet
11	Training Medical Large Vision-Language Models with Abnormal-Aware Feedback	not yet
10	Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step	not yet
10	Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation	not yet
10	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
10	A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges	not yet
10	Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning	not yet
10	VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling	not yet
10	Retrieval-Augmented Generation with Graphs (GraphRAG)	not yet
9	International AI Safety Report	not yet
9	Open Problems in Mechanistic Interpretability	not yet
9	Qwen2.5-1M Technical Report	not yet
9	UI-TARS: Pioneering Automated GUI Interaction with Native Agents
9	RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?	not yet
9	Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models	not yet
9	Tensor Product Attention Is All You Need	not yet
9	Multi-Agent Collaboration Mechanisms: A Survey of LLMs	not yet
9	Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains	not yet
9	Circuit Complexity Bounds for Visual Autoregressive Model	not yet
8	o3-mini vs DeepSeek-R1: Which One is Safer?	not yet
8	Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge	not yet
8	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	not yet
8	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	not yet
8	Pinching Antennas: Principles, Applications and Challenges	not yet
8	A General Framework for Inference-time Scaling and Steering of Diffusion Models	not yet
8	MinMo: A Multimodal Large Language Model for Seamless Voice Interaction	not yet
8	URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics	not yet
8	Rotatable Antenna Enabled Wireless Communication: Modeling and Optimization	not yet
8	Test-Time Compute: from System-1 Thinking to System-2 Thinking	not yet
7	SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer	not yet
7	Molecular-driven Foundation Model for Oncologic Pathology	not yet
7	Baichuan-Omni-1.5 Technical Report	not yet
7	Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy	not yet
7	Reasoning Language Models: A Blueprint	not yet
7	VideoWorld: Exploring Knowledge Learning from Unlabeled Videos	not yet
7	O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning	not yet
7	Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion	not yet
7	OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis	not yet
7	CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings	not yet
6	SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling	not yet
6	Multimodal Large Language Models for Image, Text, and Speech Data Augmentation: A Survey	not yet
6	Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation
6	Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	not yet
6	SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model	not yet
6	Improving Video Generation with Human Feedback	not yet
6	Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks	not yet
6	Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training	not yet
6	RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation	not yet
6	Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review	not yet
6	Diffusion Adversarial Post-Training for One-Step Video Generation	not yet
6	Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning	not yet
6	InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection	not yet
6	ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling	not yet
5	Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming	not yet
5	Probing topological matter and fermion dynamics on a neutral-atom quantum computer	not yet
5	AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders	not yet
5	How Linguistics Learned to Stop Worrying and Love the Language Models	not yet
5	Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies	not yet
5	Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models	not yet
5	Fanar: An Arabic-Centric Multimodal Generative AI Platform	not yet
5	Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos	not yet
5	Multi-Level Attention and Contrastive Learning for Enhanced Text Classification with an Optimized Transformer	not yet
5	GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model for Multi-organ Segmentation	not yet
5	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
5	A Survey on Multi-Turn Interaction Capabilities of Large Language Models	not yet
5	Quantum-Centric Algorithm for Sample-Based Krylov Diagonalization	not yet
5	Vision-Language Models Do Not Understand Negation	not yet
5	Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG	not yet
5	Enhancing Automated Interpretability with Output-Centric Feature Descriptions	not yet
5	WebWalker: Benchmarking LLMs in Web Traversal	not yet
5	Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI	not yet
5	Multi-subject Open-set Personalization in Video Generation	not yet
5	Enabling Scalable Oversight via Self-Evolving Critic	not yet
5	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	not yet
5	ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning	not yet
5	LLM4SR: A Survey on Large Language Models for Scientific Research	not yet
5	Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives	not yet
5	Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control	not yet
5	The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	not yet
5	EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation	not yet
5	SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation	not yet
5	Object-level Visual Prompts for Compositional Image Generation	not yet
5	Nested Attention: Semantic-aware Attention Values for Concept Personalization	not yet
5	LEO-Split: A Semi-Supervised Split Learning Framework over LEO Satellite Networks	not yet
5	CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries	not yet
5	OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning	not yet
5	Dual Diffusion for Unified Image Generation and Understanding	not yet
4	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	not yet
4	Efficient Reasoning with Hidden Thinking	not yet
4	Diffusion Autoencoders are Scalable Image Tokenizers	not yet
4	GuardReasoner: Towards Reasoning-based LLM Safeguards	not yet
4	MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	not yet
4	Sparse Autoencoders Can Interpret Randomly Initialized Transformers	not yet
4	Large Language Models for Code Generation: The Practitioners Perspective	not yet
4	Parameter-Efficient Fine-Tuning for Foundation Models	not yet
4	UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models	not yet
4	Low-dimensional adaptation of diffusion models: Convergence in total variation	not yet
4	Continuous 3D Perception Model with Persistent State	not yet
4	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	not yet
4	Poison-RAG: Adversarial Data Poisoning Attacks on Retrieval-Augmented Generation in Recommender Systems	not yet
4	Tell me about yourself: LLMs are aware of their learned behaviors	not yet
4	Generative Physical AI in Vision: A Survey	not yet
4	Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments	not yet
4	Infrastructure for AI Agents	not yet
4	A Simple Aerial Detection Baseline of Multimodal Language Models	not yet
4	Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians	not yet
4	What Limits LLM-based Human Simulation: LLMs or Our Design?	not yet
4	Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models	not yet
4	GameFactory: Creating New Games with Generative Interactive Videos	not yet
4	CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation	not yet
4	Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards	not yet
4	MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation	not yet

※ 被引用数は更新日における NASA ADSのデータを参照しています
https://ui.adsabs.harvard.edu/

2

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

2