17 |
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge |
not yet |
14 |
Emotion-Aware Interaction Design in Intelligent User Interface Using Multi-Modal Deep Learning |
not yet |
13 |
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step |
|
12 |
Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification |
not yet |
12 |
Metric Learning for Tag Recommendation: Tackling Data Sparsity and Cold Start Issues |
not yet |
11 |
Tulu 3: Pushing Frontiers in Open Language Model Post-Training |
not yet |
11 |
Logical computation demonstrated with a neutral atom quantum processor |
not yet |
10 |
Enhancing Few-Shot Learning with Integrated Data and GAN Model Approaches |
not yet |
10 |
Optimizing Gesture Recognition for Seamless UI Interaction Using Convolutional Neural Networks |
not yet |
10 |
Graph Neural Network-Based Entity Extraction and Relationship Reasoning in Complex Knowledge Graphs |
not yet |
10 |
Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction |
not yet |
10 |
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM |
not yet |
9 |
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions |
|
9 |
Advanced RAG Models with Graph Structures: Optimizing Complex Knowledge Reasoning and Text Generation |
not yet |
8 |
Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data |
not yet |
8 |
A Survey on LLM-as-a-Judge |
not yet |
8 |
Enhancing LLM Reasoning with Reward-guided Tree Search |
not yet |
8 |
LoRA-LiteE: A Computationally Efficient Framework for Chatbot Preference-Tuning |
not yet |
8 |
Circuit Complexity Bounds for RoPE-based Transformer Architecture |
not yet |
8 |
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI |
|
8 |
Scaling Laws for Precision |
|
7 |
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining |
not yet |
7 |
A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation |
not yet |
7 |
Generative Agent Simulations of 1,000 People |
|
7 |
Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs |
not yet |
7 |
Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments |
not yet |
7 |
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models |
not yet |
7 |
Measuring short-form factuality in large language models |
not yet |
7 |
Randomized Autoregressive Visual Generation |
not yet |
6 |
WavChat: A Survey of Spoken Dialogue Models |
not yet |
6 |
RedPajama: an Open Dataset for Training Large Language Models |
not yet |
6 |
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization |
not yet |
6 |
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning |
not yet |
6 |
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion |
not yet |
6 |
How Far is Video Generation from World Model: A Physical Law Perspective |
|
6 |
What do sin$(x)$ and arcsinh$(x)$ have in Common? |
not yet |
5 |
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency |
not yet |
5 |
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs |
not yet |
5 |
Robust Graph Neural Networks for Stability Analysis in Dynamic Networks |
not yet |
5 |
Large Wireless Model (LWM): A Foundation Model for Wireless Channels |
not yet |
5 |
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning |
not yet |
5 |
Data-Driven Control of Large-Scale Networks with Formal Guarantees: A Small-Gain Free Approach |
not yet |
5 |
Quantum speedups in solving near-symmetric optimization problems by low-depth QAOA |
not yet |
5 |
DiT4Edit: Diffusion Transformer for Image Editing |
not yet |
5 |
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs |
not yet |
5 |
Distributionally Robust Optimization |
not yet |
5 |
GenXD: Generating Any 3D and 4D Scenes |
|
5 |
MdEval: Massively Multilingual Code Debugging |
not yet |
5 |
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation |
not yet |
5 |
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent |
not yet |
5 |
Leveraging Microservices Architecture for Dynamic Pricing in the Travel Industry: Algorithms, Scalability, and Impact on Revenue and Customer Satisfaction |
not yet |
4 |
Multimodal Whole Slide Foundation Model for Pathology |
not yet |
4 |
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models |
not yet |
4 |
Self-Generated Critiques Boost Reward Modeling for Language Models |
not yet |
4 |
Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training |
not yet |
4 |
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? |
not yet |
4 |
Exploring the Use of Machine Learning Weather Models in Data Assimilation |
not yet |
4 |
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models |
not yet |
4 |
Is there any Trinity of Gravity, to start with? |
not yet |
4 |
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction |
not yet |
4 |
Trajectory Tracking Using Frenet Coordinates with Deep Deterministic Policy Gradient |
not yet |
4 |
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory |
|
4 |
ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses |
not yet |
4 |
Number it: Temporal Grounding Videos like Flipping Manga |
not yet |
4 |
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models |
not yet |
4 |
Polarization Aware Movable Antenna |
not yet |
4 |
A Next-Generation Approach to Airline Reservations: Integrating Cloud Microservices with AI and Blockchain for Enhanced Operational Performance |
not yet |
4 |
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems |
not yet |
4 |
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models |
not yet |
4 |
HourVideo: 1-Hour Video-Language Understanding |
not yet |
4 |
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation |
not yet |
4 |
Measure-to-measure interpolation using Transformers |
not yet |
4 |
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks |
|
4 |
Continuous-Time State Estimation Methods in Robotics: A Survey |
not yet |
4 |
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level |
not yet |
4 |
Long Context RAG Performance of Large Language Models |
not yet |
4 |
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness |
not yet |
4 |
Towards evaluations-based safety cases for AI scheming |
not yet |
4 |
Efficient Hamiltonian, structure and trace distance learning of Gaussian states |
not yet |
4 |
MuCol Milestone Report No. 5: Preliminary Parameters |
not yet |
4 |
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer |
not yet |
4 |
PageRank Bandits for Link Prediction |
not yet |
4 |
Strengthening DeFi Security: A Static Analysis Approach to Flash Loan Vulnerabilities |
not yet |
4 |
Rule Based Rewards for Language Model Safety |
not yet |
4 |
GameGen-X: Interactive Open-world Game Video Generation |
|
4 |
PatternBoost: Constructions in Mathematics with a Little Help from AI |
not yet |
4 |
A Lorentz-Equivariant Transformer for All of the LHC |
not yet |
3 |
Universal non-Hermitian transport in disordered systems |
not yet |
3 |
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge |
not yet |
3 |
Large Language Model-Brained GUI Agents: A Survey |
not yet |
3 |
Information geometry of bosonic Gaussian thermal states |
not yet |
3 |
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability |
not yet |
3 |
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation |
not yet |
3 |
OracleSage: Towards Unified Visual-Linguistic Understanding of Oracle Bone Scripts through Cross-Modal Knowledge Fusion |
not yet |
3 |
Anytime Acceleration of Gradient Descent |
not yet |
3 |
Scaling Speech-Text Pre-training with Synthetic Interleaved Data |
not yet |
3 |
ShowUI: One Vision-Language-Action Model for GUI Visual Agent |
not yet |
3 |
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models |
not yet |
3 |
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages |
not yet |
3 |
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training |
not yet |
3 |
When Spatial meets Temporal in Action Recognition |
not yet |
3 |
Global Challenge for Safe and Secure LLMs Track 1 |
not yet |
3 |
Robust Detection of Watermarks for Large Language Models Under Human Edits |
not yet |
3 |
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models |
not yet |
3 |
IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose |
not yet |
3 |
Optimizing Airline Reservation Systems with Edge-Enabled Microservices: A Framework for Real-Time Data Processing and Enhanced User Responsiveness |
not yet |
3 |
An introduction to relativistic spin hydrodynamics |
not yet |
3 |
High-fidelity universal gates in the $^{171}$Yb ground state nuclear spin qubit |
not yet |
3 |
Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation |
not yet |
3 |
Towards Open-Vocabulary Audio-Visual Event Localization |
not yet |
3 |
Debiasing Watermarks for Large Language Models via Maximal Coupling |
not yet |
3 |
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices |
not yet |
3 |
MagicQuill: An Intelligent Interactive Image Editing System |
not yet |
3 |
On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse |
|
3 |
Toward Democratized Generative AI in Next-Generation Mobile Edge Networks |
not yet |
3 |
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look |
not yet |
3 |
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows |
not yet |
3 |
Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling |
not yet |
3 |
Real-time Monitoring and Analysis of Track and Field Athletes Based on Edge Computing and Deep Reinforcement Learning Algorithm |
not yet |
3 |
SPIKANs: Separable Physics-Informed Kolmogorov-Arnold Networks |
not yet |
3 |
A Survey on Kolmogorov-Arnold Network |
not yet |
3 |
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks |
not yet |
3 |
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding |
not yet |
3 |
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views |
not yet |
3 |
Taming Rectified Flow for Inversion and Editing |
not yet |
3 |
Mixing time of quantum Gibbs sampling for random sparse Hamiltonians |
not yet |
3 |
Non-Reciprocal Beyond Diagonal RIS: Multiport Network Models and Performance Benefits in Full-Duplex Systems |
not yet |
3 |
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? |
not yet |
3 |
Self-Consistency Preference Optimization |
not yet |
3 |
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter? |
not yet |
3 |
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue |
not yet |
3 |
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding |
not yet |
3 |
Hypergraphs as Weighted Directed Self-Looped Graphs: Spectral Properties, Clustering, Cheeger Inequality |
not yet |
3 |
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization |
not yet |
3 |
RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation |
not yet |
3 |
A Comprehensive Study on Quantization Techniques for Large Language Models |
not yet |
3 |
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models |
not yet |
3 |
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance |
not yet |
3 |
Combining Induction and Transduction for Abstract Reasoning |
not yet |
3 |
TableGPT2: A Large Multimodal Model with Tabular Data Integration |
not yet |
3 |
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model |
not yet |
3 |
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism |
not yet |
3 |
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models |
not yet |
3 |
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference |
not yet |
3 |
Novel Topology and Manipulation of Scattering Singularities in Complex non-Hermitian Systems |
not yet |
3 |
AutoGLM: Autonomous Foundation Agents for GUIs |
not yet |
3 |
IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI |
not yet |
3 |
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement |
not yet |
3 |
Quantum-centric computation of molecular excited states with extended sample-based quantum diagonalization |
not yet |
3 |
A Public Dataset Tracking Social Media Discourse about the 2024 U.S. Presidential Election on Twitter/X |
not yet |
3 |
Project Sid: Many-agent simulations toward AI civilization |
not yet |