1

【2024年11月公開 Arxiv論文ランキング】2411.xxxxx

Last updated at 2025-04-04Posted at 2025-01-15

AI論文解説 Youtubeチャンネル: AI時代の羅針盤

2024年11月頃に公開されたcsカテゴリの論文 (ID: 2411.xxxxx)を被引用数のデータを元にランキングしています。ランキングは随時更新します。
(2025年4月4日更新)

被引用数	タイトル	動画
83	Tulu 3: Pushing Frontiers in Open Language Model Post-Training	not yet
72	A Survey on LLM-as-a-Judge	not yet
58	LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
48	From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge	not yet
45	Generative Agent Simulations of 1,000 People
42	Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
40	FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
30	O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?	not yet
30	Measuring short-form factuality in large language models	not yet
29	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	not yet
27	OminiControl: Minimal and Universal Control for Diffusion Transformer	not yet
27	Logical computation demonstrated with a neutral atom quantum processor	not yet
26	Randomized Autoregressive Visual Generation	not yet
25	OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models	not yet
25	How Far is Video Generation from World Model: A Physical Law Perspective
24	RedPajama: an Open Dataset for Training Large Language Models	not yet
24	Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM	not yet
22	Does Prompt Formatting Have Any Impact on LLM Performance?	not yet
22	Scaling Laws for Precision
21	Enhancing LLM Reasoning with Reward-guided Tree Search	not yet
21	Circuit Complexity Bounds for RoPE-based Transformer Architecture	not yet
20	VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	not yet
20	How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits	not yet
19	Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
18	Multimodal Whole Slide Foundation Model for Pathology	not yet
18	VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models	not yet
18	SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models	not yet
18	Taming Rectified Flow for Inversion and Editing	not yet
17	CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models	not yet
17	Identity-Preserving Text-to-Video Generation by Frequency Decomposition	not yet
17	Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision	not yet
17	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	not yet
17	WavChat: A Survey of Spoken Dialogue Models	not yet
17	Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models	not yet
17	HourVideo: 1-Hour Video-Language Understanding	not yet
17	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	not yet
16	Large Language Model-Brained GUI Agents: A Survey	not yet
16	On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality	not yet
16	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion	not yet
15	Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations	not yet
15	Emotion-Aware Interaction Design in Intelligent User Interface Using Multi-Modal Deep Learning	not yet
15	Towards evaluations-based safety cases for AI scheming	not yet
15	Personalization of Large Language Models: A Survey	not yet
14	Self-Generated Critiques Boost Reward Modeling for Language Models	not yet
14	Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency	not yet
14	Hymba: A Hybrid-head Architecture for Small Language Models
14	The Surprising Effectiveness of Test-Time Training for Few-Shot Learning	not yet
13	RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts	not yet
13	Learning Humanoid Locomotion with Perceptive Internal Model	not yet
13	OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs	not yet
13	SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	not yet
13	Metric Learning for Tag Recommendation: Tackling Data Sparsity and Cold Start Issues	not yet
13	A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness	not yet
13	Improving Steering Vectors by Targeting Sparse Autoencoder Features	not yet
12	Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	not yet
12	OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining	not yet
12	DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
12	Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification	not yet
12	A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL	not yet
12	Safety case template for frontier AI: A cyber inability argument	not yet
12	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	not yet
12	DiT4Edit: Diffusion Transformer for Image Editing	not yet
12	Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation	not yet
12	Addressing Representation Collapse in Vector Quantized Models with One Linear Layer	not yet
12	Vision-Language Models Can Self-Improve Reasoning via Reflection	not yet
12	DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models	not yet
12	AutoGLM: Autonomous Foundation Agents for GUIs	not yet
12	PatternBoost: Constructions in Mathematics with a Little Help from AI	not yet
11	Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability	not yet
11	Scaling Speech-Text Pre-training with Synthetic Interleaved Data	not yet
11	Enhancing Few-Shot Learning with Integrated Data and GAN Model Approaches	not yet
11	All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages	not yet
11	Optimizing Gesture Recognition for Seamless UI Interaction Using Convolutional Neural Networks	not yet
11	Graph Neural Network-Based Entity Extraction and Relationship Reasoning in Complex Knowledge Graphs	not yet
11	OASIS: Open Agent Social Interaction Simulations with One Million Agents	not yet
11	Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	not yet
11	ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning	not yet
11	MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs	not yet
11	MdEval: Massively Multilingual Code Debugging	not yet
11	Rule Based Rewards for Language Model Safety	not yet
11	Survey of Cultural Awareness in Language Models: Text and Beyond	not yet
11	Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations	not yet
10	VLSBench: Unveiling Visual Leakage in Multimodal Safety	not yet
10	INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge	not yet
10	CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation	not yet
10	Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models	not yet
10	Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data	not yet
10	ShowUI: One Vision-Language-Action Model for GUI Visual Agent	not yet
10	Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction	not yet
10	SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
10	High-fidelity universal gates in the $^{171}$Yb ground state nuclear spin qubit	not yet
10	Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development	not yet
10	The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use	not yet
10	LoRA-LiteE: A Computationally Efficient Framework for Chatbot Preference-Tuning	not yet
10	Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows	not yet
10	Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives	not yet
10	From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond	not yet
10	Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level	not yet
10	TableGPT2: A Large Multimodal Model with Tabular Data Integration	not yet
10	Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis	not yet
9	GRAPE: Generalizing Robot Policy via Preference Alignment	not yet
9	AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers	not yet
9	TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	not yet
9	SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation	not yet
9	WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model	not yet
9	MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	not yet
9	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	not yet
9	Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios	not yet
9	Multimodal Autoregressive Pre-training of Large Vision Encoders	not yet
9	BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	not yet
9	Disentangling Memory and Reasoning Ability in Large Language Models	not yet
9	A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation	not yet
9	BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices	not yet
9	Large Wireless Model (LWM): A Foundation Model for Wireless Channels	not yet
9	Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents	not yet
9	A Survey on Kolmogorov-Arnold Network	not yet
9	Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks	not yet
9	LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
9	GUI Agents with Foundation Models: A Comprehensive Survey	not yet
9	Advanced RAG Models with Graph Structures: Optimizing Complex Knowledge Reasoning and Text Generation	not yet
9	Distributionally Robust Optimization	not yet
9	Attacking Vision-Language Computer Agents via Pop-ups	not yet
9	WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning	not yet
9	On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback	not yet
9	DexHub and DART: Towards Internet Scale Robot Data Collection	not yet
9	GameGen-X: Interactive Open-world Game Video Generation
9	A Public Dataset Tracking Social Media Discourse about the 2024 U.S. Presidential Election on Twitter/X	not yet
9	RSL-SQL: Robust Schema Linking in Text-to-SQL Generation	not yet
8	Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS	not yet
8	Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration	not yet
8	Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training	not yet
8	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	not yet
8	FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression	not yet
8	Evaluating the Robustness of Analogical Reasoning in Large Language Models	not yet
8	When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training	not yet
8	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	not yet
8	Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
8	Understanding Chain-of-Thought in LLMs through Information Theory	not yet
8	ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses	not yet
8	AnimateAnything: Consistent and Controllable Animation for Video Generation	not yet
8	Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs	not yet
8	Golden Noise for Diffusion Models: A Learning Framework
8	Game-theoretic LLM: Agent Workflow for Negotiation Games	not yet
8	Autoregressive Models in Vision: A Survey	not yet
8	LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions	not yet
8	Quantum speedups in solving near-symmetric optimization problems by low-depth QAOA	not yet
8	Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding	not yet
8	Science and Project Planning for the Forward Physics Facility in Preparation for the 2024-2026 European Particle Physics Strategy Update	not yet
8	Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?	not yet
8	Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback	not yet
8	What do sin$(x)$ and arcsinh$(x)$ have in Common?	not yet
8	Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement	not yet
8	A Lorentz-Equivariant Transformer for All of the LHC	not yet
8	Project Sid: Many-agent simulations toward AI civilization	not yet
7	Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation	not yet
7	$H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	not yet
7	I2VControl: Disentangled and Unified Video Motion Synthesis Control	not yet

※ 被引用数は更新日における NASA ADSのデータを参照しています
https://ui.adsabs.harvard.edu/

1

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

1