1

【2024年12月公開 Arxiv論文ランキング】2412.xxxxx

0

Posted at 2025-04-04

AI論文解説 Youtubeチャンネル: AI時代の羅針盤

2024年12月頃に公開されたcsカテゴリの論文 (ID: 2412.xxxxx)を被引用数のデータを元にランキングしています。ランキングは随時更新します。
(2025年4月4日更新)

被引用数	タイトル	動画
595	Qwen2.5 Technical Report	not yet
456	DeepSeek-V3 Technical Report	not yet
262	OpenAI o1 System Card	not yet
147	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	not yet
94	HunyuanVideo: A Systematic Framework For Large Video Generative Models
82	Phi-4 Technical Report
50	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	not yet
48	Do NOT Think That Much for 2+3
43	Training Large Language Models to Reason in a Continuous Latent Space
39	Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference	not yet
38	Open-Sora Plan: Open-Source Large Video Generation Model
37	ProcessBench: Identifying Process Errors in Mathematical Reasoning	not yet
36	Open-Sora: Democratizing Efficient Video Production for All	not yet
36	Structured 3D Latents for Scalable and Versatile 3D Generation	not yet
34	Deliberative Alignment: Reasoning Enables Safer Language Models	not yet
33	Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems	not yet
30	LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods	not yet
26	Free Process Rewards without Process Labels	not yet
25	Alignment faking in large language models	not yet
25	Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
23	LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks	not yet
23	Flow Matching Guide and Code	not yet
23	Flexible-Antenna Systems: A Pinching-Antenna Perspective	not yet
22	Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search	not yet
22	NVILA: Efficient Frontier Visual Language Models	not yet
22	Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation	not yet
20	HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
20	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	not yet
20	Byte Latent Transformer: Patches Scale Better Than Tokens
20	Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier	not yet
20	GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot	not yet
19	TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation	not yet
18	Large Concept Models: Language Modeling in a Sentence Representation Space
18	VisionZip: Longer is Better but Not Necessary in Vision Language Models
18	PaliGemma 2: A Family of Versatile VLMs for Transfer
17	Fast Gradient Computation for RoPE Attention in Almost Linear Time	not yet
17	Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
16	Experimental Demonstration of Logical Magic State Distillation	not yet
16	MetaMorph: Multimodal Understanding and Generation via Instruction Tuning	not yet
16	ExBody2: Advanced Expressive Humanoid Whole-Body Control	not yet
16	Frontier Models are Capable of In-context Scheming	not yet
16	o1-Coder: an o1 Replication for Coding	not yet
15	Token-Budget-Aware LLM Reasoning	not yet
15	ARC Prize 2024: Technical Report
15	Best-of-N Jailbreaking	not yet
14	Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers	not yet
14	Motion Prompting: Controlling Video Generation with Motion Trajectories	not yet
13	CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models	not yet
13	Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning	not yet
13	Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control	not yet
13	From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
13	The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity	not yet
12	Formal Mathematical Reasoning: A New Frontier in AI	not yet
12	FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching	not yet
12	TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
12	RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation	not yet
12	Compressed Chain of Thought: Efficient Reasoning Through Dense Representations	not yet
12	Apollo: An Exploration of Video Understanding in Large Multimodal Models
12	Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
11	An analytic theory of creativity in convolutional diffusion models	not yet
11	LMFusion: Adapting Pretrained Language Models for Multimodal Generation	not yet
11	Flex Attention: A Programming Model for Generating Optimized Attention Kernels	not yet
11	InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences	not yet
10	A Survey on Large Language Model Acceleration based on KV Cache Management	not yet
10	MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
10	Jasper and Stella: distillation of SOTA embedding models	not yet
10	DRT: Deep Reasoning Translation via Long Chain-of-Thought	not yet
10	Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis	not yet
10	Parallelized Autoregressive Visual Generation	not yet
10	AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling	not yet
10	Autoregressive Video Generation without Vector Quantization	not yet
10	Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models	not yet
10	LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers	not yet
10	InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions	not yet
10	The BrowserGym Ecosystem for Web Agent Research	not yet
10	Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction	not yet
10	Liquid: Language Models are Scalable and Unified Multi-modal Generators	not yet
10	Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey	not yet
10	[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster	not yet
9	OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis	not yet
9	Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders	not yet
9	LearnLM: Improving Gemini for Learning
9	Offline Reinforcement Learning for LLM Multi-Step Reasoning	not yet
9	Score-based Generative Diffusion Models for Social Recommendations	not yet
9	Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models	not yet
9	Entropy-Regularized Process Reward Model	not yet
9	TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies	not yet
9	FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models	not yet
9	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics	not yet
9	A Consolidated Volatility Prediction with Back Propagation Neural Network and Genetic Algorithm	not yet
9	On Evaluating the Durability of Safeguards for Open-Weight LLMs	not yet
9	Gated Delta Networks: Improving Mamba2 with Delta Rule	not yet
9	BatchTopK Sparse Autoencoders	not yet
9	Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison	not yet
9	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	not yet
9	Evaluating and Aligning CodeLLMs on Human Preference	not yet
9	RandAR: Decoder-only Autoregressive Visual Generation in Random Orders	not yet
9	Scaling New Frontiers: Insights into Large Recommendation Models	not yet
8	Training Software Engineering Agents and Verifiers with SWE-Gym	not yet
8	Aria-UI: Visual Grounding for GUI Instructions	not yet
8	Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback	not yet
8	Categorical Symmetries in Spin Models with Atom Arrays	not yet
8	GUI Agents: A Survey	not yet
8	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement	not yet
8	Fault-Tolerant Operation and Materials Science with Neutral Atom Logical Qubits	not yet
8	Hierarchical Split Federated Learning: Convergence Analysis and System Optimization	not yet
8	On the Expressive Power of Modern Hopfield Networks	not yet
8	International Scientific Report on the Safety of Advanced AI (Interim Report)	not yet
8	Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression	not yet
8	U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs	not yet
8	Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models	not yet
8	An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction	not yet
7	TradingAgents: Multi-Agents LLM Financial Trading Framework	not yet
7	SegKAN: High-Resolution Medical Image Segmentation with Long-Distance Dependencies	not yet
7	DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers	not yet
7	KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis	not yet
7	Progressive Multimodal Reasoning via Active Retrieval	not yet
7	MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval	not yet
7	Agent-SafetyBench: Evaluating the Safety of LLM Agents	not yet
7	Minimum Data Rate Maximization for Uplink Pinching-Antenna Systems	not yet
7	Large Language Model Enhanced Recommender Systems: A Survey	not yet
7	SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents	not yet
7	Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance	not yet
7	C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness	not yet
7	Reinforcement Learning Enhanced LLMs: A Survey	not yet
7	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	not yet
7	SPT: Sequence Prompt Transformer for Interactive Image Segmentation	not yet
7	Simple Guidance Mechanisms for Discrete Diffusion Models	not yet
7	A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions	not yet
7	APOLLO: SGD-like Memory, AdamW-level Performance	not yet
7	EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
7	NaVILA: Legged Robot Vision-Language-Action Model for Navigation	not yet
7	Advanced Risk Prediction and Stability Assessment of Banks Using Time Series Transformer Models	not yet
7	Navigation World Models	not yet
7	ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning	not yet
7	Enhancing Recommendation Systems with GNNs and Addressing Over-Smoothing	not yet
7	Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications	not yet
7	HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving	not yet
7	Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review	not yet
7	FullStack Bench: Evaluating LLMs as Full Stack Coders	not yet
7	Task Singular Vectors: Reducing Task Interference in Model Merging	not yet
6	VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation	not yet
6	GME: Improving Universal Multimodal Retrieval by Multimodal LLMs	not yet
6	Universal Machine Learning Interatomic Potentials are Ready for Phonons	not yet
6	Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning	not yet
6	Multi-LLM Text Summarization	not yet
6	AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving	not yet
6	MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	not yet
6	How to Synthesize Text Data without Model Collapse?	not yet
6	Numerical Pruning for Efficient Autoregressive Models	not yet
6	Wonderland: Navigating 3D Scenes from a Single Image	not yet
6	ExecRepoBench: Multi-level Executable Code Completion Evaluation	not yet
6	A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	not yet
6	SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models	not yet
6	ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data	not yet

※ 被引用数は更新日における NASA ADSのデータを参照しています
https://ui.adsabs.harvard.edu/

1

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

1