40 |
Movie Gen: A Cast of Media Foundation Models |
|
36 |
GPT-4o System Card |
not yet |
24 |
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models |
|
14 |
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second |
|
11 |
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer |
not yet |
11 |
Pixtral 12B |
not yet |
11 |
Moshi: a speech-text foundation model for real-time dialogue |
not yet |
10 |
Self-Supervised Graph Neural Networks for Enhanced Feature Extraction in Heterogeneous Information Networks |
not yet |
10 |
A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining |
not yet |
10 |
Adversarial Neural Networks in Medical Imaging Advancements and Challenges in Semantic Segmentation |
not yet |
10 |
A Survey on Diffusion Models for Inverse Problems |
not yet |
9 |
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens |
|
9 |
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation |
not yet |
9 |
Balancing Innovation and Privacy: Data Security Strategies in Natural Language Processing Applications |
not yet |
9 |
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation |
not yet |
9 |
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference |
not yet |
9 |
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion |
not yet |
9 |
Loong: Generating Minute-level Long Videos with Autoregressive Language Models |
not yet |
8 |
Predicting Liquidity Coverage Ratio with Gated Recurrent Units: A Deep Learning Model for Risk Management |
not yet |
8 |
Efficient and Aesthetic UI Design with a Deep Learning-Based Interface Generation Tree Algorithm |
not yet |
8 |
Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm |
not yet |
8 |
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation |
not yet |
8 |
Applying Hybrid Graph Neural Networks to Strengthen Credit Risk Analysis |
not yet |
8 |
Video Instruction Tuning With Synthetic Data |
not yet |
7 |
O1 Replication Journey: A Strategic Progress Report -- Part 1 |
not yet |
7 |
Optimizing Retrieval-Augmented Generation with Elasticsearch for Enhanced Question-Answering Systems |
not yet |
7 |
Automated Genre-Aware Article Scoring and Feedback Using Large Language Models |
not yet |
7 |
HSR-Enhanced Sparse Attention Acceleration |
not yet |
7 |
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think |
not yet |
7 |
Pyramidal Flow Matching for Efficient Video Generative Modeling |
|
7 |
Differential Transformer |
|
7 |
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents |
not yet |
7 |
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding |
not yet |
6 |
Data Scaling Laws in Imitation Learning for Robotic Manipulation |
not yet |
6 |
Allegro: Open the Black Box of Commercial-Level Video Generation Model |
|
6 |
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models |
not yet |
6 |
Blockchain-Based Trust and Transparency in Airline Reservation Systems using Microservices Architecture |
not yet |
6 |
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations |
not yet |
6 |
Generative AI and Its Impact on Personalized Intelligent Tutoring Systems |
not yet |
6 |
SpeGCL: Self-supervised Graph Spectrum Contrastive Learning without Positive Samples |
not yet |
6 |
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers |
not yet |
6 |
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow |
not yet |
6 |
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering |
|
6 |
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge |
not yet |
6 |
IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers |
not yet |
5 |
Exposing Cross-Platform Coordinated Inauthentic Activity in the Run-Up to the 2024 U.S. Election |
not yet |
5 |
Deep Learning for Medical Text Processing: BERT Model Fine-Tuning and Comparative Study |
not yet |
5 |
Optimizing Travel Itineraries with AI Algorithms in a Microservices Architecture: Balancing Cost, Time, Preferences, and Sustainability |
not yet |
5 |
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction |
not yet |
5 |
One-Step Diffusion Distillation through Score Implicit Matching |
not yet |
5 |
Jailbreaking LLM-Controlled Robots |
not yet |
5 |
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws |
not yet |
5 |
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent |
not yet |
5 |
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix |
not yet |
5 |
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies |
not yet |
5 |
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers |
not yet |
5 |
How to Construct Random Unitaries |
not yet |
5 |
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models |
not yet |
5 |
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents |
not yet |
5 |
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis |
not yet |
5 |
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training |
not yet |
5 |
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation |
not yet |
5 |
Aria: An Open Multimodal Native Mixture-of-Experts Model |
not yet |
5 |
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation |
not yet |
5 |
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery |
not yet |
5 |
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning |
not yet |
5 |
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly |
not yet |
5 |
How to Train Long-Context Language Models (Effectively) |
not yet |
5 |
ImageFolder: Autoregressive Image Generation with Folded Tokens |
not yet |
4 |
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images |
not yet |
4 |
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control |
not yet |
4 |
Deep Learning with HM-VGG: AI Strategies for Multi-modal Image Analysis |
not yet |
4 |
Unearthing a Billion Telegram Posts about the 2024 U.S. Presidential Election: Development of a Public Dataset |
not yet |
4 |
Human-Centric eXplainable AI in Education |
not yet |
4 |
LLM-Slice: Dedicated Wireless Network Slicing for Large Language Models |
not yet |
4 |
Quantum linear system algorithm with optimal queries to initial state preparation |
not yet |
4 |
Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification |
not yet |
4 |
YOLOv11: An Overview of the Key Architectural Enhancements |
not yet |
4 |
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding |
|
4 |
The XLZD Design Book: Towards the Next-Generation Liquid Xenon Observatory for Dark Matter and Neutrino Physics |
not yet |
4 |
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages |
not yet |
4 |
Iterative Methods via Locally Evolving Set Process |
not yet |
4 |
MCSFF: Multi-modal Consistency and Specificity Fusion Framework for Entity Alignment |
not yet |
4 |
Beamforming Optimization for Continuous Aperture Array (CAPA)-based Communications |
not yet |
4 |
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation |
not yet |
4 |
ALOHA Unleashed: A Simple Recipe for Robot Dexterity |
not yet |
4 |
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models |
not yet |
4 |
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation |
not yet |
4 |
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos |
not yet |
4 |
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities |
not yet |
4 |
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models |
not yet |
4 |
When Attention Sink Emerges in Language Models: An Empirical View |
not yet |
4 |
Rethinking Legal Judgement Prediction in a Realistic Scenario in the Era of Large Language Models |
not yet |
4 |
Locking Down the Finetuned LLMs Safety |
not yet |
4 |
The Ingredients for Robotic Diffusion Transformers |
not yet |
4 |
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation |
not yet |
4 |
Impurities and polarons in bosonic quantum gases: a review on recent progress |
not yet |
4 |
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes |
not yet |
4 |
Improved List Size for Folded Reed-Solomon Codes |
not yet |
4 |
Ocean-omni: To Understand the World with Omni-modality |
not yet |
4 |
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation |
not yet |
4 |
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection |
not yet |
4 |
Automated Creation of Digital Cousins for Robust Policy Learning |
not yet |
4 |
Toward hybrid quantum simulations with qubits and qumodes on trapped-ion platforms |
not yet |
4 |
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation |
not yet |
4 |
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching |
|
4 |
Dynamic metastability in the self-attention model |
not yet |
4 |
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion |
not yet |
4 |
MDAP: A Multi-view Disentangled and Adaptive Preference Learning Framework for Cross-Domain Recommendation |
not yet |
4 |
Strong Model Collapse |
|
4 |
Stochastic Runge-Kutta Methods: Provable Acceleration of Diffusion Models |
not yet |
4 |
CAR: Controllable Autoregressive Modeling for Visual Generation |
not yet |
4 |
Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning |
not yet |
4 |
Dynamic Diffusion Transformer |
not yet |
4 |
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark |
not yet |
4 |
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences |
not yet |
4 |
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models |
not yet |
4 |
Iterated Radical Expansions and Convergence |
not yet |
4 |
Deep Learning Alternatives of the Kolmogorov Superposition Theorem |
not yet |
4 |
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL |
not yet |
4 |
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown |
not yet |
4 |
The Patterns of Life Human Mobility Simulation |
not yet |
3 |
Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning |
not yet |
3 |
Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model |
not yet |
3 |
A note on polynomial-time tolerant testing stabilizer states |
not yet |
3 |
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding |
not yet |
3 |
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse |
|
3 |
Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models |
not yet |
3 |
Enhancing Resilience and Scalability in Travel Booking Systems: A Microservices Approach to Fault Tolerance, Load Balancing, and Service Discovery |
not yet |
3 |
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms |
not yet |
3 |
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences |
not yet |
3 |
Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality |
not yet |
3 |
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs |
not yet |
3 |
CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation |
not yet |
3 |
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents |
not yet |
3 |
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors |
not yet |
3 |
A Survey of Conversational Search |
not yet |
3 |
Pruning Foundation Models for High Accuracy without Retraining |
not yet |
3 |
Transversal non-Clifford gates for quantum LDPC codes on sheaves |
not yet |
3 |
DepthSplat: Connecting Gaussian Splatting and Depth |
not yet |
3 |
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control |
not yet |
3 |
DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering |
not yet |
3 |
L3DG: Latent 3D Gaussian Diffusion |
not yet |
3 |
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion |
not yet |
3 |
Generative Reward Models |
not yet |
3 |
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression |
not yet |
3 |
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines |
not yet |
3 |
Latent Action Pretraining from Videos |
|
3 |
Agent-as-a-Judge: Evaluate Agents with Agents |
|
3 |
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads |
|
3 |
Boosting Camera Motion Control for Video Diffusion Transformers |
not yet |
3 |
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling |
not yet |
3 |
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling |
not yet |
3 |
Animate-X: Universal Character Image Animation with Enhanced Motion Representation |
not yet |
3 |
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models |
not yet |
3 |
Safety-Aware Fine-Tuning of Large Language Models |
not yet |
3 |
Lessons Learned: A Smart Campus Environment Using LoRaWAN |
not yet |
3 |
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models |
not yet |
3 |
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model |
not yet |
3 |
ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback |
not yet |
3 |
Language model developers should report train-test overlap |
not yet |
3 |
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning |
not yet |
3 |
Efficient Quantum Pseudorandomness from Hamiltonian Phase States |
not yet |
3 |
Hybrid Summary Statistics |
not yet |
3 |
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model |
not yet |
3 |
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis |
not yet |
3 |
Personalized Visual Instruction Tuning |
not yet |
3 |
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs |
not yet |
3 |
MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense |
not yet |
3 |
Gibbs state preparation for commuting Hamiltonian: Mapping to classical Gibbs sampling |
not yet |
3 |
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking |
not yet |
3 |
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution |
not yet |
3 |
Equivariant Neural Functional Networks for Transformers |
not yet |
3 |
Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks |
not yet |
3 |
A survey of Zarankiewicz problem in geometry |
not yet |
3 |
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs |
not yet |
3 |
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding |
not yet |
3 |
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations |
not yet |
3 |
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models |
not yet |
3 |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations |
|
3 |
ControlAR: Controllable Image Generation with Autoregressive Models |
not yet |
3 |
Analyzing black-hole ringdowns II: data conditioning |
not yet |
3 |
Learning classical density functionals for ionic fluids |
not yet |
3 |
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning |
|
3 |
Urban Anomalies: A Simulated Human Mobility Dataset with Injected Anomalies |
not yet |
3 |
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts |
not yet |
3 |
softmax is not enough (for sharp out-of-distribution) |
not yet |
3 |
MERIT: Multimodal Wearable Vital Sign Waveform Monitoring |
not yet |
3 |
Enhanced Credit Score Prediction Using Ensemble Deep Learning Model |
not yet |
3 |
Transferable Unsupervised Outlier Detection Framework for Human Semantic Trajectories |
not yet |