1751 |
The Llama 3 Herd of Models |
not yet |
558 |
Qwen2 Technical Report |
 |
65 |
PaliGemma: A versatile 3B VLM for transfer |
not yet |
64 |
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models |
not yet |
57 |
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling |
 |
50 |
Qwen2-Audio Technical Report |
not yet |
46 |
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output |
not yet |
38 |
Random unitaries in extremely low depth |
not yet |
37 |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision |
 |
35 |
OpenHands: An Open Platform for AI Software Developers as Generalist Agents |
 |
35 |
Learning to (Learn at Test Time): RNNs with Expressive Hidden States |
not yet |
34 |
Agentless: Demystifying LLM-based Software Engineering Agents |
not yet |
31 |
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens |
not yet |
30 |
Gymnasium: A Standard Interface for Reinforcement Learning Environments |
not yet |
28 |
LLM Critics Help Catch LLM Bugs |
 |
27 |
A Survey on Mixture of Experts |
not yet |
25 |
Apple Intelligence Foundation Language Models |
 |
25 |
KAN or MLP: A Fairer Comparison |
not yet |
25 |
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation |
not yet |
25 |
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving |
not yet |
24 |
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention |
 |
24 |
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback |
not yet |
23 |
Compact Language Models via Pruning and Knowledge Distillation |
 |
23 |
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders |
not yet |
22 |
MUSE: Machine Unlearning Six-Way Evaluation for Language Models |
not yet |
21 |
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge |
 |
21 |
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models |
not yet |
20 |
Discrete Flow Matching |
not yet |
20 |
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models |
not yet |
20 |
Jailbreak Attacks and Defenses Against Large Language Models: A Survey |
not yet |
20 |
TokenPacker: Efficient Visual Projector for Multimodal LLM |
not yet |
20 |
AI Agents That Matter |
not yet |
19 |
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies |
not yet |
19 |
MambaVision: A Hybrid Mamba-Transformer Vision Backbone |
 |
19 |
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence |
not yet |
19 |
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets |
not yet |
18 |
Stable Audio Open |
 |
18 |
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning |
not yet |
18 |
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control |
not yet |
18 |
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions |
not yet |
18 |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs |
not yet |
17 |
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process |
not yet |
17 |
A polynomial-time classical algorithm for noisy quantum circuits |
not yet |
17 |
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine |
not yet |
17 |
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models |
not yet |
17 |
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems |
 |
16 |
Tora: Trajectory-oriented Diffusion Transformer for Video Generation |
not yet |
16 |
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding |
not yet |
16 |
Deep Time Series Models: A Comprehensive Survey and Benchmark |
not yet |
16 |
Vision language models are blind |
not yet |
16 |
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages |
not yet |
16 |
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control |
not yet |
16 |
A Review of Large Language Models and Autonomous Agents in Chemistry |
not yet |
16 |
Tree Search for Language Model Agents |
not yet |
15 |
Adaptive Training of Grid-Dependent Physics-Informed Kolmogorov-Arnold Networks |
not yet |
15 |
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models |
not yet |
15 |
Open Problems in Technical AI Governance |
not yet |
15 |
Shape of Motion: 4D Reconstruction from a Single Video |
not yet |
15 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving |
not yet |
15 |
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation |
not yet |
15 |
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis |
not yet |
15 |
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds |
not yet |
15 |
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion |
not yet |
15 |
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models |
not yet |
15 |
Tarsier: Recipes for Training and Evaluating Large Video Description Models |
not yet |
14 |
Benchmarking and fidelity response theory of high-fidelity Rydberg entangling gates |
not yet |
14 |
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval |
not yet |
14 |
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey |
not yet |
14 |
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More |
not yet |
14 |
Consent in Crisis: The Rapid Decline of the AI Data Commons |
not yet |
14 |
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? |
not yet |
14 |
Distilling System 2 into System 1 |
 |
14 |
7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition |
not yet |
13 |
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs |
not yet |
13 |
Internal Consistency and Self-Feedback in Large Language Models: A Survey |
not yet |
13 |
A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks |
 |
13 |
SEED-Story: Multimodal Long Story Generation with Large Language Model |
 |
13 |
Scalable, high-fidelity all-electronic control of trapped-ion qubits |
not yet |
13 |
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI |
not yet |
13 |
OffsetBias: Leveraging Debiased Data for Tuning Evaluators |
not yet |
13 |
RegMix: Data Mixture as Regression for Language Model Pre-training |
not yet |
12 |
ShieldGemma: Generative AI Content Moderation Based on Gemma |
not yet |
12 |
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases |
not yet |
12 |
The Art of Saying No: Contextual Noncompliance in Language Models |
not yet |
12 |
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models |
not yet |
12 |
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends |
not yet |
12 |
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models |
not yet |
12 |
AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing |
not yet |
12 |
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition |
not yet |
12 |
Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning |
not yet |
12 |
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models |
not yet |
12 |
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models |
not yet |
11 |
Can Editing LLMs Inject Harm? |
not yet |
11 |
Preliminary WMT24 Ranking of General MT Systems and LLMs |
not yet |
11 |
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency |
 |
11 |
VILA$^2$: VILA Augmented VILA |
not yet |
11 |
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach |
not yet |
11 |
Advanced AI Framework for Enhanced Detection and Assessment of Abdominal Trauma: Integrating 3D Segmentation with 2D CNN and RNN Models |
not yet |
11 |
Movable Antenna-Enhanced Wireless Communications: General Architectures and Implementation Methods |
not yet |
11 |
Interim report for the International Muon Collider Collaboration (IMCC) |
not yet |
11 |
Does Refusal Training in LLMs Generalize to the Past Tense? |
not yet |
11 |
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism |
not yet |
11 |
Deconstructing What Makes a Good Optimizer for Language Models |
not yet |
11 |
Controlling Space and Time with Diffusion Models |
not yet |
11 |
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation |
not yet |
11 |
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? |
not yet |
11 |
Mixture of A Million Experts |
 |
11 |
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs |
not yet |
11 |
Quantum coarsening and collective dynamics on a programmable quantum simulator |
not yet |
11 |
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs |
not yet |
11 |
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents |
not yet |
10 |
Recursive Introspection: Teaching Language Model Agents How to Self-Improve |
not yet |
10 |
Demystifying Verbatim Memorization in Large Language Models |
not yet |
10 |
NV-Retriever: Improving text embedding models with effective hard-negative mining |
not yet |
10 |
BOND: Aligning LLMs with Best-of-N Distillation |
not yet |
10 |
Prover-Verifier Games improve legibility of LLM outputs |
not yet |
10 |
A Comprehensive Survey on Kolmogorov Arnold Networks (KAN) |
not yet |
10 |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? |
not yet |
10 |
Benchmarking Vision Language Models for Cultural Understanding |
not yet |
10 |
Robotic Control via Embodied Chain-of-Thought Reasoning |
not yet |
10 |
Autoregressive Speech Synthesis without Vector Quantization |
not yet |
10 |
Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning |
not yet |
10 |
Entropy Law: The Story Behind Data Compression and LLM Performance |
not yet |
10 |
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions |
not yet |
10 |
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models |
not yet |
10 |
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs |
 |
10 |
Benchmarking Complex Instruction-Following with Multiple Constraints Composition |
not yet |
10 |
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent |
not yet |
10 |
Learning tensor networks with tensor cross interpolation: new algorithms and libraries |
not yet |
10 |
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? |
not yet |
10 |
Searching for Best Practices in Retrieval-Augmented Generation |
not yet |
10 |
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI |
not yet |
9 |
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs |
not yet |
9 |
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts |
 |
9 |
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models |
not yet |
9 |
Machine Unlearning in Generative AI: A Survey |
not yet |
9 |
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher |
 |
9 |
RLCoder: Reinforcement Learning for Repository-Level Code Completion |
not yet |
9 |
Inferring turbulent velocity and temperature fields and their statistics from Lagrangian velocity measurements using physics-informed Kolmogorov-Arnold Networks |
not yet |
9 |
When Can Transformers Count to n? |
not yet |
9 |
Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise |
not yet |
9 |
Differential Privacy of Cross-Attention with Provable Guarantee |
not yet |
9 |
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation |
not yet |
9 |
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review |
not yet |
9 |
Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data |
not yet |
9 |
LAB-Bench: Measuring Capabilities of Language Models for Biology Research |
not yet |
9 |
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training |
not yet |
9 |
$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$ |
not yet |
9 |
Rectifier: Code Translation with Corrector via LLMs |
not yet |
9 |
3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes |
not yet |
9 |
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps |
not yet |
9 |
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale |
not yet |
9 |
LoRA-GA: Low-Rank Adaptation with Gradient Approximation |
not yet |
9 |
On scalable oversight with weak LLMs judging strong LLMs |
not yet |
9 |
Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials |
not yet |
9 |
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations |
not yet |
9 |
Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations |
not yet |
8 |
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention |
not yet |
8 |
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies |
not yet |
8 |
Know Your Limits: A Survey of Abstention in Large Language Models |
not yet |
8 |
Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation |
not yet |
8 |
Adaptive Robot Detumbling of a Non-Rigid Satellite |
not yet |
8 |
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents |
not yet |
8 |
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning |
not yet |
8 |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? |
not yet |
8 |
Flow as the Cross-Domain Manipulation Interface |
not yet |
8 |
Knowledge Mechanisms in Large Language Models: A Survey and Perspective |
not yet |
8 |
Falcon2-11B Technical Report |
not yet |
8 |
Weak-to-Strong Reasoning |
not yet |
8 |
Differential Privacy Mechanisms in Neural Tangent Kernel Regression |
not yet |
8 |
Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks |
not yet |
8 |
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval |
not yet |
8 |
AutoFlow: Automated Workflow Generation for Large Language Model Agents |
not yet |
8 |
LLM Inference Serving: Survey of Recent Advances and Opportunities |
not yet |
8 |
LongLaMP: A Benchmark for Personalized Long-form Text Generation |
not yet |
8 |
UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers |
not yet |
8 |
Lean-STaR: Learning to Interleave Thinking and Proving |
 |
8 |
The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges |
not yet |
8 |
A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model |
not yet |
8 |
The Solar and Geomagnetic Storms in May 2024: A Flash Data Report |
not yet |
8 |
On Leakage of Code Generation Evaluation Datasets |
not yet |
8 |
Advanced Financial Fraud Detection Using GNN-CL Model |
not yet |
8 |
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study |
not yet |
8 |
Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU |
not yet |
8 |
Logical Operators and Fold-Transversal Gates of Bivariate Bicycle Codes |
not yet |
8 |
Stacked Intelligent Metasurfaces for Wireless Sensing and Communication: Applications and Challenges |
not yet |
8 |
Large-scale quantum reservoir learning with an analog quantum computer |
not yet |
8 |
Research on Autonomous Robots Navigation based on Reinforcement Learning |
not yet |
8 |
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models |
not yet |
8 |
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs |
not yet |
8 |
EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning |
not yet |
8 |
On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs) |
not yet |
8 |
Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles |
not yet |
8 |
Diffusion Models and Representation Learning: A Survey |
not yet |
8 |
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP |
not yet |
8 |
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks |
not yet |
7 |
Enhanced Self-Checkout System for Retail Based on Improved YOLOv10 |
not yet |
7 |
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models |
not yet |
7 |
COEFF-KANs: A Paradigm to Address the Electrolyte Field with KANs |
not yet |
7 |
Towards Effective and Efficient Continual Pre-training of Large Language Models |
not yet |
7 |
PersonaGym: Evaluating Persona Agents and LLMs |
not yet |
7 |
Physics Informed Kolmogorov-Arnold Neural Networks for Dynamical Analysis via Efficent-KAN and WAV-KAN |
not yet |
7 |
AI Safety in Generative AI Large Language Models: A Survey |
not yet |
7 |
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption |
not yet |
7 |
Financial Statement Analysis with Large Language Models |
not yet |
7 |
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation |
not yet |
7 |
PyBench: Evaluating LLM Agent on various real-world coding tasks |
not yet |
7 |
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence |
not yet |
7 |
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads |
not yet |
7 |
Data driven weather forecasts trained and initialised directly from observations |
not yet |
7 |
BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM |
not yet |
7 |
Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization |
not yet |
7 |
SciCode: A Research Coding Benchmark Curated by Scientists |
not yet |
7 |
DropKAN: Regularizing KANs by masking post-activations |
not yet |
7 |
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems |
not yet |
7 |
Reasoning with Large Language Models, a Survey |
 |
7 |
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models |
not yet |
7 |
Tailoring Solution Accuracy for Fast Whole-body Model Predictive Control of Legged Robots |
not yet |
7 |
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation |
not yet |
7 |
Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences |
not yet |
7 |
Video Diffusion Alignment via Reward Gradients |
not yet |
7 |
Lynx: An Open Source Hallucination Evaluation Model |
not yet |
7 |
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks |
not yet |
7 |
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models |
not yet |
7 |
Decentralized Adaptive Aerospace Transportation of Unknown Loads Using A Team of Robots |
not yet |
7 |
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs |
not yet |
7 |
Diffusion Model-Based Video Editing: A Survey |
not yet |
7 |
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation |
not yet |
7 |
Enabling 6G Performance in the Upper Mid-Band by Transitioning From Massive to Gigantic MIMO |
not yet |
7 |
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation |
not yet |
7 |
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts |
not yet |
7 |
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation |
not yet |
7 |
Trustworthy Classification through Rank-Based Conformal Prediction Sets |
not yet |
7 |
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild |
not yet |
7 |
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis |
not yet |
7 |
BM25S: Orders of magnitude faster lexical search via eager sparse scoring |
not yet |
7 |
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts |
not yet |
7 |
Effect of a Process Mining based Pre-processing Step in Prediction of the Critical Health Outcomes |
not yet |
7 |
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization |
not yet |
7 |
Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects |
not yet |
7 |
DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing |
not yet |
7 |
PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning |
not yet |
7 |
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks |
not yet |
7 |
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models |
not yet |
7 |
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs |
not yet |
7 |
Tracking the 2024 US Presidential Election Chatter on Tiktok: A Public Multimodal Dataset |
not yet |
7 |
Non-Hermitian skin effect in arbitrary dimensions: non-Bloch band theory and classification |
not yet |
7 |
MIRAI: Evaluating LLM Agents for Event Forecasting |
not yet |
7 |
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies |
not yet |
7 |
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning |
not yet |
7 |
Quantum Circuit Synthesis and Compilation Optimization: Overview and Prospects |
not yet |
7 |
Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation |
not yet |
6 |
Berkeley Humanoid: A Research Platform for Learning-based Control |
not yet |
6 |
DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks |
not yet |
6 |
ThinK: Thinner Key Cache by Query-Driven Pruning |
not yet |
6 |
Diffusion Feedback Helps CLIP See Better |
not yet |
6 |
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle |
not yet |
6 |
Autonomous Navigation of Unmanned Vehicle Through Deep Reinforcement Learning |
not yet |
6 |
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? |
not yet |
6 |
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover |
not yet |
6 |
From Text to Insight: Large Language Models for Materials Science Data Extraction |
not yet |
6 |
Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning |
not yet |
6 |
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data |
 |
6 |
NNsight and NDIF: Democratizing Access to Foundation Model Internals |
not yet |
6 |
EVLM: An Efficient Vision-Language Model for Visual Understanding |
not yet |
6 |
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference |
not yet |
6 |
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models |
not yet |
6 |
GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model |
not yet |
6 |
Non-Fermi liquid and antiferromagnetic correlations with hole doping in the bilayer two-orbital Hubbard model of La$_3$Ni$_2$O$_7$ at zero temperature |
not yet |
6 |
Retrieval-Augmented Generation for Natural Language Processing: A Survey |
not yet |
6 |
R+X: Retrieval and Execution from Everyday Human Videos |
not yet |
6 |
The Foundation Model Transparency Index v1.1: May 2024 |
not yet |
6 |
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore |
not yet |
6 |
PQCache: Product Quantization-based KVCache for Long Context LLM Inference |
not yet |
6 |
IMAGDressing-v1: Customizable Virtual Dressing |
not yet |
6 |
The Better Angels of Machine Personality: How Personality Relates to LLM Safety |
not yet |
6 |
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression |
not yet |
6 |
Scaling Diffusion Transformers to 16 Billion Parameters |
not yet |
6 |
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild |
not yet |
6 |
Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images |
not yet |
6 |
Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics |
not yet |
6 |
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena |
not yet |
6 |
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study |
not yet |
6 |
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation |
not yet |
6 |
Real-time gravitational-wave inference for binary neutron stars using machine learning |
not yet |
6 |
Human-like Episodic Memory for Infinite Context LLMs |
not yet |
6 |
Evaluating AI Evaluation: Perils and Prospects |
not yet |
6 |
Benchmarking quantum computers |
not yet |
6 |
Still-Moving: Customized Video Generation without Customized Video Data |
not yet |
6 |
WildGaussians: 3D Gaussian Splatting in the Wild |
not yet |
6 |
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting |
not yet |
6 |
Source Code Summarization in the Era of Large Language Models |
 |
6 |
Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities |
not yet |
6 |
Fine-Tuning Large Language Models with User-Level Differential Privacy |
not yet |
6 |
Video-to-Audio Generation with Hidden Alignment |
not yet |
6 |
From Principles to Rules: A Regulatory Approach for Frontier AI |
not yet |
6 |
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models |
not yet |
6 |
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models |
not yet |
6 |
Variational Best-of-N Alignment |
not yet |
6 |
On the Limitations of Compute Thresholds as a Governance Strategy |
not yet |
6 |
Language Representations Can be What Recommenders Need: Findings and Potentials |
not yet |
6 |
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models |
not yet |
6 |
Configurable DOA Estimation using Incremental Learning |
not yet |
6 |
AgentInstruct: Toward Generative Teaching with Agentic Flows |
not yet |
6 |
Solving Motion Planning Tasks with a Scalable Generative Model |
not yet |
6 |
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning |
not yet |
6 |
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency |
not yet |
6 |
LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance |
not yet |
6 |
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application |
not yet |
6 |
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing |
not yet |
6 |
ColPali: Efficient Document Retrieval with Vision Language Models |
not yet |
6 |
$\text{Memory}^3$: Language Modeling with Explicit Memory |
not yet |
6 |
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents |
not yet |
6 |
Posterior Sampling with Denoising Oracles via Tilted Transport |
not yet |
6 |
LiteSearch: Efficacious Tree Search for LLM |
not yet |
5 |
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? |
not yet |
5 |
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models |
not yet |
5 |
Model Attribution in LLM-Generated Disinformation: A Domain Generalization Approach with Supervised Contrastive Learning |
not yet |
5 |
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks |
not yet |
5 |
AI-Assisted Generation of Difficult Math Questions |
not yet |
5 |
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification |
not yet |
5 |
Rethinking the Function of Neurons in KANs |
not yet |
5 |
Theia: Distilling Diverse Vision Foundation Models for Robot Learning |
not yet |
5 |
F-KANs: Federated Kolmogorov-Arnold Networks |
not yet |
5 |
Mixture of Nested Experts: Adaptive Processing of Visual Tokens |
not yet |
5 |
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation |
not yet |
5 |
Hybrid summary statistics: neural weak lensing inference beyond the power spectrum |
not yet |
5 |
Effects of Scale on Language Model Robustness |
 |
5 |
Transformers on Markov Data: Constant Depth Suffices |
not yet |
5 |
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries |
not yet |
5 |
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization |
not yet |
5 |
AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies |
not yet |
5 |
A Basis-Free Phase Space Electronic Hamiltonian That Recovers Beyond Born-Oppenheimer Electronic Momentum and Current Density |
not yet |
5 |
Data-driven Koopman operator predictions of turbulent dynamics in models of shear flows |
not yet |
5 |
Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis |
not yet |
5 |
Implications of the laser excitation of the Th-229 nucleus for dark matter searches |
not yet |
5 |
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity |
not yet |
5 |
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? |
not yet |
5 |
Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models |
not yet |
5 |
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model |
not yet |
5 |
XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models |
not yet |
5 |
Deep State Space Recurrent Neural Networks for Time Series Forecasting |
not yet |
5 |
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning |
not yet |
5 |
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting |
not yet |
5 |
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities |
not yet |
5 |
NeuroBind: Towards Unified Multimodal Representations for Neural Signals |
not yet |
5 |
Understanding Reference Policies in Direct Preference Optimization |
not yet |
5 |
Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking |
not yet |
5 |
Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments |
not yet |
5 |
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis |
not yet |
5 |
Are Large Language Models Capable of Generating Human-Level Narratives? |
not yet |
5 |
Research on Image Super-Resolution Reconstruction Mechanism based on Convolutional Neural Network |
not yet |