146 |
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools |
not yet |
106 |
OpenVLA: An Open-Source Vision-Language-Action Model |
not yet |
93 |
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs |
not yet |
87 |
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence |
not yet |
79 |
Depth Anything V2 |
not yet |
77 |
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale |
not yet |
76 |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation |
not yet |
74 |
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark |
not yet |
71 |
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs |
not yet |
61 |
A Survey on Large Language Models for Code Generation |
|
59 |
Nemotron-4 340B Technical Report |
not yet |
58 |
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline |
not yet |
57 |
Scaling and evaluating sparse autoencoders |
not yet |
52 |
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions |
not yet |
51 |
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts |
not yet |
49 |
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation |
not yet |
47 |
Convolutional Kolmogorov-Arnold Networks |
not yet |
46 |
Autoregressive Image Generation without Vector Quantization |
not yet |
45 |
The Prompt Report: A Systematic Survey of Prompting Techniques |
|
44 |
Refusal in Language Models Is Mediated by a Single Direction |
not yet |
41 |
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing |
not yet |
40 |
DataComp-LM: In search of the next generation of training sets for language models |
not yet |
39 |
Long Context Transfer from Language to Vision |
not yet |
39 |
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions |
not yet |
38 |
CodeGemma: Open Code Models Based on Gemma |
not yet |
38 |
Improving Alignment and Robustness with Circuit Breakers |
not yet |
36 |
Scaling Synthetic Data Creation with 1,000,000,000 Personas |
not yet |
36 |
HelpSteer2: Open-source dataset for training top-performing reward models |
not yet |
34 |
Improve Mathematical Reasoning in Language Models by Automated Process Supervision |
not yet |
33 |
An Empirical Study of Mamba-based Language Models |
not yet |
33 |
Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability |
not yet |
33 |
AIFS -- ECMWF's data-driven forecasting system |
not yet |
32 |
Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers |
not yet |
31 |
Mixture-of-Agents Enhances Large Language Model Capabilities |
|
30 |
fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions |
not yet |
30 |
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models |
not yet |
29 |
GKAN: Graph Kolmogorov-Arnold Networks |
not yet |
29 |
A Temporal Kolmogorov-Arnold Transformer for Time Series Forecasting |
not yet |
29 |
FourierKAN-GCF: Fourier Kolmogorov-Arnold Network -- An Effective and Efficient Feature Transformation for Graph Collaborative Filtering |
not yet |
28 |
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey |
not yet |
27 |
LiveBench: A Challenging, Contamination-Free LLM Benchmark |
not yet |
27 |
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation |
not yet |
26 |
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges |
not yet |
26 |
An Image is Worth 32 Tokens for Reconstruction and Generation |
not yet |
26 |
PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction |
not yet |
26 |
Safety Alignment Should Be Made More Than Just a Few Tokens Deep |
not yet |
26 |
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers |
not yet |
26 |
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild |
not yet |
26 |
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding |
not yet |
26 |
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search |
not yet |
26 |
Kolmogorov-Arnold Network for Satellite Image Classification in Remote Sensing |
not yet |
25 |
CU-Net: a U-Net architecture for efficient brain-tumor segmentation on BraTS 2019 dataset |
not yet |
25 |
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling |
not yet |
24 |
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking |
not yet |
24 |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models |
|
23 |
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs |
not yet |
23 |
KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning |
not yet |
23 |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training |
not yet |
23 |
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding |
not yet |
23 |
Streamlining and standardizing software citations with The Software Citation Station |
not yet |
23 |
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation |
not yet |
22 |
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack |
not yet |
22 |
Simple and Effective Masked Diffusion Language Models |
not yet |
22 |
CRAG -- Comprehensive RAG Benchmark |
not yet |
21 |
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance |
not yet |
21 |
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks |
not yet |
21 |
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors |
not yet |
21 |
Suitability of KANs for Computer Vision: A preliminary investigation |
not yet |
21 |
TextGrad: Automatic "Differentiation" via Text |
not yet |
21 |
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model |
not yet |
21 |
Simplified and Generalized Masked Diffusion for Discrete Data |
not yet |
21 |
Credit Card Fraud Detection Using Advanced Transformer Model |
not yet |
21 |
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms |
not yet |
21 |
To Believe or Not to Believe Your LLM |
|
21 |
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling |
not yet |
21 |
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization |
not yet |
20 |
Simulating Classroom Education with LLM-Empowered Agents |
not yet |
20 |
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs |
not yet |
20 |
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? |
not yet |
20 |
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning |
not yet |
20 |
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation |
not yet |
20 |
CodeR: Issue Resolving with Multi-Agent and Task Graphs |
not yet |
20 |
BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models |
not yet |
19 |
Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks |
not yet |
19 |
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold |
not yet |
19 |
HumanPlus: Humanoid Shadowing and Imitation from Humans |
not yet |
19 |
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs |
not yet |
19 |
Grounding Image Matching in 3D with MASt3R |
not yet |
19 |
MAIRA-2: Grounded Radiology Report Generation |
not yet |
19 |
Are We Done with MMLU? |
not yet |
19 |
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation |
not yet |
18 |
RouteLLM: Learning to Route LLMs with Preference Data |
|
18 |
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs |
not yet |
18 |
rKAN: Rational Kolmogorov-Arnold Networks |
not yet |
18 |
BSRBF-KAN: A combination of B-splines and Radial Basis Functions in Kolmogorov-Arnold Networks |
not yet |
18 |
Incompressibility and spectral gaps of random circuits |
not yet |
18 |
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B |
not yet |
17 |
Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems |
not yet |
17 |
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT |
not yet |
17 |
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models |
not yet |
17 |
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees |
not yet |
17 |
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning |
not yet |
17 |
Towards Infinite-Long Prefix in Transformer |
not yet |
17 |
Vision-LSTM: xLSTM as Generic Vision Backbone |
not yet |
17 |
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models |
not yet |
17 |
RaDe-GS: Rasterizing Depth in Gaussian Splatting |
not yet |
16 |
Symbolic Learning Enables Self-Evolving Agents |
not yet |
16 |
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs |
not yet |
16 |
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA |
not yet |
16 |
GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks |
not yet |
16 |
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models |
not yet |
16 |
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges |
not yet |
16 |
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning |
not yet |
16 |
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models |
not yet |
16 |
Scaling Large-Language-Model-based Multi-Agent Collaboration |
not yet |
16 |
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures |
not yet |
16 |
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models |
not yet |
16 |
PowerInfer-2: Fast Large Language Model Inference on a Smartphone |
not yet |
16 |
MotionClone: Training-Free Motion Cloning for Controllable Video Generation |
not yet |
16 |
Benchmark Data Contamination of Large Language Models: A Survey |
not yet |
16 |
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit |
not yet |
16 |
Spectroscopy and modeling of $^{171}$Yb Rydberg states for high-fidelity two-qubit gates |
not yet |
16 |
How to Understand Whole Software Repository? |
not yet |
16 |
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models |
not yet |
15 |
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale |
not yet |
15 |
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm |
not yet |
15 |
Are Language Models Actually Useful for Time Series Forecasting? |
not yet |
15 |
One Thousand and One Pairs: A "novel" challenge for long-context language models |
not yet |
15 |
Blind Baselines Beat Membership Inference Attacks for Foundation Models |
not yet |
15 |
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference |
not yet |
15 |
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging |
not yet |
15 |
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models |
not yet |
15 |
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference |
not yet |
15 |
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL |
not yet |
15 |
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks |
not yet |
15 |
Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification |
not yet |
15 |
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark |
not yet |
15 |
Advanced Payment Security System:XGBoost, LightGBM and SMOTE Integrated |
not yet |
15 |
Towards Scalable Automated Alignment of LLMs: A Survey |
not yet |
14 |
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets |
|
14 |
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models |
not yet |
14 |
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding |
not yet |
14 |
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models |
not yet |
14 |
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents |
not yet |
14 |
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback |
not yet |
14 |
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions |
not yet |
14 |
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance |
not yet |
14 |
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models |
not yet |
14 |
On the Effects of Data Scale on UI Control Agents |
not yet |
14 |
Guiding a Diffusion Model with a Bad Version of Itself |
not yet |
14 |
ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU |
not yet |
14 |
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration |
not yet |
14 |
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling |
not yet |
13 |
Changing Answer Order Can Decrease MMLU Accuracy |
not yet |
13 |
Kolmogorov-Arnold Graph Neural Networks |
not yet |
13 |
CodeRAG-Bench: Can Retrieval Augment Code Generation? |
not yet |
13 |
Instruction Pre-Training: Language Models are Supervised Multitask Learners |
not yet |
13 |
WebCanvas: Benchmarking Web Agents in Online Environments |
not yet |
13 |
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains |
not yet |
13 |
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities |
not yet |
13 |
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations |
not yet |
13 |
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers |
not yet |
13 |
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding |
not yet |
13 |
OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning |
not yet |
13 |
What If We Recaption Billions of Web Images with LLaMA-3? |
not yet |
13 |
Large Language Model Unlearning via Embedding-Corrupted Prompts |
not yet |
13 |
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models |
not yet |
13 |
Towards Semantic Equivalence of Tokenization in Multimodal LLM |
not yet |
13 |
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image |
not yet |
13 |
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation |
not yet |
13 |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models |
not yet |
13 |
Transformers need glasses! Information over-squashing in language tasks |
not yet |
13 |
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt |
not yet |
13 |
Scalable MatMul-free Language Modeling |
not yet |
13 |
iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets |
not yet |
13 |
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models |
not yet |
12 |
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy |
not yet |
12 |
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation |
not yet |
12 |
Understanding and Mitigating Language Confusion in LLMs |
not yet |
12 |
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation |
not yet |
12 |
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models |
not yet |
12 |
Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration |
not yet |
12 |
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking |
not yet |
12 |
Consistency Models Made Easy |
not yet |
12 |
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data |
not yet |
12 |
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data |
not yet |
12 |
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% |
not yet |
12 |
GUICourse: From General Vision Language Models to Versatile GUI Agents |
not yet |
12 |
From Pixels to Prose: A Large Dataset of Dense Image Captions |
not yet |
12 |
Training-free Camera Control for Video Generation |
not yet |
12 |
Pandora: Towards General World Model with Natural Language Actions and Video States |
not yet |
12 |
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models |
not yet |
12 |
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices |
not yet |
12 |
One-Step Effective Diffusion Network for Real-World Image Super-Resolution |
not yet |
12 |
Delving into ChatGPT usage in academic writing through excess vocabulary |
not yet |
12 |
Parallelizing Linear Transformers with the Delta Rule over Sequence Length |
not yet |
12 |
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters |
not yet |
12 |
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States |
not yet |
12 |
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model |
not yet |
12 |
Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models |
not yet |
12 |
Enhance Image-to-Image Generation with LLaVA-generated Prompts |
not yet |
12 |
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching |
not yet |
12 |
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs |
not yet |
12 |
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses |
not yet |
12 |
Dimba: Transformer-Mamba Diffusion Models |
not yet |
12 |
$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers |
not yet |
11 |
The Remarkable Robustness of LLMs: Stages of Inference? |
not yet |
11 |
Resolving Discrepancies in Compute-Optimal Scaling of Language Models |
not yet |
11 |
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models |
not yet |
11 |
Application of Multimodal Fusion Deep Learning Model in Disease Recognition |
not yet |
11 |
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference |
not yet |
11 |
Image anomaly detection and prediction scheme based on SSA optimized ResNet50-BiGRU model |
not yet |
11 |
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model |
not yet |
11 |
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces |
not yet |
11 |
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens |
not yet |
11 |
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG |
not yet |
11 |
Diffusion Models in Low-Level Vision: A Survey |
not yet |
11 |
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences |
not yet |
11 |
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners |
not yet |
11 |
LVBench: An Extreme Long Video Understanding Benchmark |
not yet |
11 |
McEval: Massively Multilingual Code Evaluation |
not yet |
11 |
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models |
not yet |
11 |
LLM Dataset Inference: Did you train on my dataset? |
not yet |
11 |
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization |
not yet |
11 |
RATT: A Thought Structure for Coherent and Correct LLM Reasoning |
not yet |
11 |
Safeguarding Large Language Models: A Survey |
not yet |
11 |
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities |
not yet |
11 |
The Dawn of Natural Language to SQL: Are We Fully Ready? |
not yet |
10 |
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding |
not yet |
10 |
Revisiting Backdoor Attacks against Large Vision-Language Models |
not yet |
10 |
Navigating LLM Ethics: Advancements, Challenges, and Future Directions |
not yet |
10 |
Localized statistics decoding: A parallel decoding algorithm for quantum low-density parity-check codes |
not yet |
10 |
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers |
not yet |
10 |
Preference Tuning For Toxicity Mitigation Generalizes Across Languages |
not yet |
10 |
AudioBench: A Universal Benchmark for Audio Large Language Models |
not yet |
10 |
Adversarial Attacks on Multimodal Agents |
not yet |
10 |
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs |
not yet |
10 |
DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer |
not yet |
10 |
Avoiding Copyright Infringement via Large Language Model Unlearning |
not yet |
10 |
L4GM: Large 4D Gaussian Reconstruction Model |
not yet |
10 |
ControlVAR: Exploring Controllable Visual Autoregressive Modeling |
not yet |
10 |
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding |
not yet |
10 |
Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? |
not yet |
10 |
RGFN: Synthesizable Molecular Generation Using GFlowNets |
not yet |
10 |
Scaling Laws in Linear Regression: Compute, Parameters, and Data |
not yet |
10 |
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models |
not yet |
10 |
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena |
not yet |
10 |
BAKU: An Efficient Transformer for Multi-Task Policy Learning |
not yet |
10 |
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification |
not yet |
10 |
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models |
not yet |
10 |
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models |
not yet |
10 |
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion |
not yet |
10 |
Multistep Distillation of Diffusion Models via Moment Matching |
not yet |
10 |
Does your data spark joy? Performance gains from domain upsampling at the end of training |
not yet |
10 |
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches |
not yet |
10 |
Demystifying the Compression of Mixture-of-Experts Through a Unified Framework |
not yet |
10 |
DrEureka: Language Model Guided Sim-To-Real Transfer |
not yet |
10 |
An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation |
not yet |
10 |
The Geometry of Categorical and Hierarchical Concepts in Large Language Models |
not yet |
10 |
Learning Temporally Consistent Video Depth from Video Diffusion Priors |
not yet |
10 |
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation |
not yet |
10 |
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback |
not yet |
9 |
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? |
not yet |
9 |
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation |
not yet |
9 |
Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis |
not yet |
9 |
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS |
not yet |
9 |
Following Length Constraints in Instructions |
not yet |
9 |
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models |
not yet |
9 |
Steering Without Side Effects: Improving Post-Deployment Control of Language Models |
not yet |
9 |
Application of Computer Deep Learning Model in Diagnosis of Pulmonary Nodules |
not yet |
9 |
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level |
not yet |
9 |
How Do Large Language Models Acquire Factual Knowledge During Pretraining? |
not yet |
9 |
Task Me Anything |
not yet |
9 |
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs |
not yet |
9 |
MASAI: Modular Architecture for Software-engineering AI Agents |
not yet |
9 |
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling |
not yet |
9 |
Step-level Value Preference Optimization for Mathematical Reasoning |
not yet |
9 |
GenQA: Generating Millions of Instructions from a Handful of Prompts |
not yet |
9 |
Quantifying Variance in Evaluation Benchmarks |
not yet |
9 |
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages |
not yet |
9 |
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models |
not yet |
9 |
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation |
not yet |
9 |
Coupled Ocean-Atmosphere Dynamics in a Machine Learning Earth System Model |
not yet |
9 |
Judging the Judges: A Systematic Investigation of Position Bias in Pairwise Comparative Assessments by LLMs |
not yet |
9 |
Zero-shot Image Editing with Reference Imitation |
not yet |
9 |
AI Sandbagging: Language Models can Strategically Underperform on Evaluations |
not yet |
9 |
Needle In A Multimodal Haystack |
not yet |
9 |
A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures |
not yet |
9 |
WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts |
not yet |
9 |
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue |
not yet |
9 |
Mamba YOLO: SSMs-Based YOLO For Object Detection |
not yet |
9 |
Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles |
not yet |
9 |
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? |
not yet |
9 |
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments |
not yet |
9 |
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving |
not yet |
9 |
Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion |
not yet |
9 |
Exploring the Potential of Polynomial Basis Functions in Kolmogorov-Arnold Networks: A Comparative Study of Different Groups of Polynomials |
not yet |
9 |
Cross-Modal Safety Alignment: Is textual unlearning all you need? |
not yet |
9 |
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation |
not yet |
9 |
DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors |
not yet |
9 |
Are you still on track!? Catching LLM Task Drift with Activations |
not yet |
9 |
Automatic Instruction Evolving for Large Language Models |
not yet |
9 |
Towards Rationality in Language and Multimodal Agents: A Survey |
not yet |
9 |
Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data |
not yet |
8 |
Decoding-Time Language Model Alignment with Multiple Objectives |
not yet |
8 |
AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies |
not yet |
8 |
MotionBooth: Motion-Aware Customized Text-to-Video Generation |
not yet |
8 |
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models |
not yet |
8 |
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models |
not yet |
8 |
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration |
not yet |
8 |
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation |
not yet |
8 |
Continuous Aperture Array (CAPA)-Based Wireless Communications: Capacity Characterization |
not yet |
8 |
Risk thresholds for frontier AI |
not yet |
8 |
Fantastic Copyrighted Beasts and How (Not) to Generate Them |
not yet |
8 |
Transferable Boltzmann Generators |
not yet |
8 |
DASB - Discrete Audio and Speech Benchmark |
not yet |
8 |
CityGPT: Empowering Urban Spatial Cognition of Large Language Models |
not yet |
8 |
SpatialBot: Precise Spatial Understanding with Vision Language Models |
not yet |
8 |
DF40: Toward Next-Generation Deepfake Detection |
not yet |
8 |
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding |
not yet |
8 |
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models |
not yet |
8 |
WPO: Enhancing RLHF with Weighted Preference Optimization |
not yet |
8 |
Transcendence: Generative Models Can Outperform The Experts That Train Them |
not yet |
8 |
Can LLM be a Personalized Judge? |
not yet |
8 |
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models |
not yet |
8 |
A Survey on Human Preference Learning for Large Language Models |
not yet |
8 |
Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising |
not yet |
8 |
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models |
not yet |
8 |
STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis |
not yet |
8 |
LRM-Zero: Training Large Reconstruction Models with Synthesized Data |
not yet |
8 |
Understanding Hallucinations in Diffusion Models through Mode Interpolation |
not yet |
8 |
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models |
not yet |
8 |
Multi-Agent Software Development through Cross-Team Collaboration |
not yet |
8 |
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing |
not yet |
8 |
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs |
not yet |
8 |
RVT-2: Learning Precise Manipulation from Few Demonstrations |
not yet |
8 |
Discovering Preference Optimization Algorithms with and for Large Language Models |
not yet |
8 |
Large Language Models Must Be Taught to Know What They Don't Know |
not yet |
8 |
Designing a Dashboard for Transparency and Control of Conversational AI |
not yet |
8 |
Trim 3D Gaussian Splatting for Accurate Geometry Representation |
not yet |
8 |
Effectively Compress KV Heads for LLM |
not yet |
8 |
Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models |
not yet |
8 |
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation |
not yet |
8 |
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection |
not yet |
8 |
6DMA Enhanced Wireless Network with Flexible Antenna Position and Rotation: Opportunities and Challenges |
not yet |
8 |
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents |
not yet |
8 |
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey |
not yet |
8 |
UltraMedical: Building Specialized Generalists in Biomedicine |
not yet |
8 |
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems |
not yet |
8 |
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data |
not yet |
8 |
Understanding the Impact of Negative Prompts: When and How Do They Take Effect? |
not yet |
8 |
HYDRA: Model Factorization Framework for Black-Box LLM Personalization |
not yet |
8 |
Leveraging KANs For Enhanced Deep Koopman Operator Discovery |
not yet |
8 |
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models |
not yet |
8 |
Process-Driven Autoformalization in Lean 4 |
not yet |
8 |
DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs |
not yet |
8 |
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation |
not yet |
8 |
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback |
not yet |
8 |
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models |
not yet |
7 |
SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting |
not yet |
7 |
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models |
not yet |
7 |
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model |
not yet |
7 |
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models |
not yet |
7 |
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation |
not yet |
7 |
Evaluating Copyright Takedown Methods for Language Models |
not yet |
7 |
Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields |
not yet |
7 |
On the Evaluation of Large Language Models in Unit Test Generation |
not yet |
7 |
Point-SAM: Promptable 3D Segmentation Model for Point Clouds |
not yet |
7 |
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models |
not yet |
7 |
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool |
not yet |
7 |
A Complete Survey on LLM-based AI Chatbots |
not yet |
7 |
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation |
not yet |
7 |
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation |
not yet |
7 |
Adam-mini: Use Fewer Learning Rates To Gain More |
not yet |
7 |
WARP: On the Benefits of Weight Averaged Rewarded Policies |
not yet |
7 |
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs |
not yet |
7 |
LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction |
not yet |
7 |
Can LLM Graph Reasoning Generalize beyond Pattern Memorization? |
not yet |
7 |
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs |
not yet |
7 |
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention |
not yet |
7 |
Image Conductor: Precision Control for Interactive Video Synthesis |
not yet |
7 |
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation |
not yet |
7 |
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression |
not yet |
7 |
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models |
not yet |
7 |
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data |
not yet |
7 |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs |
not yet |
7 |
Timo: Towards Better Temporal Reasoning for Language Models |
not yet |
7 |
CityBench: Evaluating the Capabilities of Large Language Model as World Model |
not yet |
7 |
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets |
not yet |
7 |
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation |
not yet |
7 |
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? |
not yet |
7 |
Coding Speech through Vocal Tract Kinematics |
not yet |
7 |
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents |
not yet |
7 |
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI |
not yet |
7 |
TSI-Bench: Benchmarking Time Series Imputation |
not yet |
7 |
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention |
not yet |
7 |
AgentReview: Exploring Peer Review Dynamics with LLM Agents |
not yet |
7 |
VoCo-LLaMA: Towards Vision Compression with Large Language Models |
not yet |