270 |
Gemma 2: Improving Open Language Models at a Practical Size |
 |
200 |
SAM 2: Segment Anything in Images and Videos |
 |
129 |
LLaVA-OneVision: Easy Visual Task Transfer |
not yet |
107 |
MiniCPM-V: A GPT-4V Level MLLM on Your Phone |
 |
101 |
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer |
not yet |
84 |
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters |
 |
57 |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery |
 |
50 |
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model |
not yet |
48 |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation |
not yet |
32 |
CogVLM2: Visual Language Models for Image and Video Understanding |
not yet |
31 |
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 |
not yet |
28 |
KAN 2.0: Kolmogorov-Arnold Networks Meet Science |
not yet |
28 |
VITA: Towards Open-Source Interactive Omni Multimodal LLM |
 |
26 |
Generative Verifiers: Reward Modeling as Next-Token Prediction |
not yet |
26 |
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models |
not yet |
25 |
Self-Taught Evaluators |
not yet |
24 |
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities |
 |
24 |
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models |
not yet |
23 |
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 |
 |
22 |
ControlNeXt: Powerful and Efficient Control for Image and Video Generation |
not yet |
21 |
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness |
not yet |
20 |
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents |
 |
20 |
Tamper-Resistant Safeguards for Open-Weight LLMs |
not yet |
19 |
Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning |
not yet |
18 |
Building and better understanding vision-language models: insights and future directions |
not yet |
18 |
Imagen 3 |
 |
18 |
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining |
not yet |
17 |
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function |
not yet |
17 |
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models |
not yet |
16 |
A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models |
not yet |
15 |
Automated Design of Agentic Systems |
not yet |
15 |
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers |
 |
15 |
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models |
not yet |
14 |
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling |
 |
14 |
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming |
not yet |
14 |
Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning |
not yet |
14 |
Diffusion Models Are Real-Time Game Engines |
 |
14 |
LongVILA: Scaling Long-Context Visual Language Models for Long Videos |
not yet |
14 |
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search |
not yet |
14 |
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future |
not yet |
13 |
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet |
 |
13 |
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time |
not yet |
13 |
Real-Time Video Generation with Pyramid Attention Broadcast |
not yet |
13 |
A universal neutral-atom quantum computer with individual optical addressing and non-destructive readout |
not yet |
13 |
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods |
not yet |
13 |
Attention Mechanism and Context Modeling System for Text Mining Machine Translation |
not yet |
13 |
Advanced User Credit Risk Prediction Model using LightGBM, XGBoost and Tabnet with SMOTEENN |
not yet |
12 |
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders |
 |
12 |
A Survey on Benchmarks of Multimodal Large Language Models |
not yet |
12 |
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs |
not yet |
12 |
Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach |
 |
12 |
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy |
not yet |
12 |
Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering |
not yet |
12 |
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI |
not yet |
12 |
Segment Anything in Medical Images and Videos: Benchmark and Deployment |
not yet |
12 |
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine |
not yet |
11 |
OmniRe: Omni Urban Scene Reconstruction |
not yet |
11 |
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling |
not yet |
11 |
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input |
not yet |
11 |
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale |
not yet |
11 |
A Tighter Complexity Analysis of SparseGPT |
not yet |
11 |
Critique-out-Loud Reward Models |
not yet |
11 |
Graph Retrieval-Augmented Generation: A Survey |
not yet |
11 |
Dynamic Hypergraph-Enhanced Prediction of Sequential Medical Visits |
not yet |
11 |
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents |
not yet |
11 |
Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes |
not yet |
11 |
Language Model Can Listen While Speaking |
not yet |
10 |
Review: Quantum Metrology and Sensing with Many-Body Systems |
not yet |
10 |
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? |
not yet |
10 |
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models |
not yet |
10 |
Algorithm Research of ELMo Word Embedding and Deep Learning Multimodal Transformer in Image Description |
not yet |
10 |
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities |
not yet |
10 |
A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods |
not yet |
10 |
Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning |
not yet |
10 |
A Survey of Mamba |
not yet |
10 |
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency |
not yet |
9 |
Text classification optimization algorithm based on graph neural network |
not yet |
9 |
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents |
not yet |
9 |
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM |
not yet |
9 |
Rhyme-aware Chinese lyric generator based on GPT |
not yet |
9 |
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning |
not yet |
9 |
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning |
not yet |
9 |
ECG-FM: An Open Electrocardiogram Foundation Model |
not yet |
9 |
Applying Conditional Generative Adversarial Networks for Imaging Diagnosis |
not yet |
9 |
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving |
not yet |
8 |
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model |
not yet |
8 |
Quantum Convolutional Neural Networks are (Effectively) Classically Simulable |
not yet |
8 |
Convolutional Neural Networks for Predictive Modeling of Lung Disease |
not yet |
8 |
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation |
not yet |
8 |
LLM Pruning and Distillation in Practice: The Minitron Approach |
not yet |
8 |
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models |
not yet |
8 |
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation |
not yet |
8 |
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models |
not yet |
8 |
Robust Domain Generalization for Multi-modal Object Recognition |
not yet |
8 |
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? |
not yet |
8 |
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws |
not yet |
8 |
Segment anything model 2: an application to 2D and 3D medical images |
not yet |
8 |
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement |
not yet |
7 |
Machine Learning-Based Research on the Adaptability of Adolescents to Online Education |
not yet |
7 |
Foundation Models for Music: A Survey |
not yet |
7 |
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates |
not yet |
7 |
Research on Improved U-net Based Remote Sensing Image Segmentation Algorithm |
not yet |
7 |
Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis |
not yet |
7 |
ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation |
not yet |
7 |
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models |
not yet |
7 |
Asymptotically Good Quantum Codes with Transversal Non-Clifford Gates |
not yet |
7 |
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents |
not yet |
7 |
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction |
not yet |
7 |
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation |
not yet |
7 |
Biomedical SAM 2: Segment Anything in Biomedical Images and Videos |
not yet |
7 |
Interactive 3D Medical Image Segmentation with SAM 2 |
not yet |
7 |
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models |
not yet |
7 |
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs |
not yet |
7 |
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation |
not yet |
7 |
Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck |
not yet |
6 |
Self-Improving Diffusion Models with Synthetic Data |
not yet |
6 |
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration |
not yet |
6 |
A Survey on Evaluation of Multimodal Large Language Models |
 |
6 |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models |
not yet |
6 |
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs |
not yet |
6 |
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities |
not yet |
6 |
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models |
not yet |
6 |
LLM-PBE: Assessing Data Privacy in Large Language Models |
not yet |
6 |
Scalable Autoregressive Image Generation with Mamba |
not yet |
6 |
Transformers are Minimax Optimal Nonparametric In-Context Learners |
not yet |
6 |
To Code, or Not To Code? Exploring Impact of Code in Pre-training |
 |
6 |
LoopSplat: Loop Closure by Registering 3D Gaussian Splats |
not yet |
6 |
Classifier-Free Guidance is a Predictor-Corrector |
not yet |
6 |
Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT's Effectiveness with Different Settings and Inputs |
not yet |
6 |
Physics-Informed Kolmogorov-Arnold Networks for Power System Dynamics |
not yet |
6 |
Fast John Ellipsoid Computation with Differential Privacy Optimization |
not yet |
6 |
Polynomial-time tolerant testing stabilizer states |
not yet |
6 |
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization |
not yet |
6 |
SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction |
not yet |
6 |
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning |
not yet |
6 |
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling |
not yet |
6 |
Understanding the Performance and Estimating the Cost of LLM Fine-Tuning |
not yet |
6 |
SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation |
not yet |
6 |
Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations |
not yet |
6 |
Floquet engineering of interactions and entanglement in periodically driven Rydberg chains |
not yet |
6 |
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information |
not yet |
6 |
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization |
not yet |
6 |
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid |
not yet |
6 |
Transformers are Universal In-context Learners |
not yet |
6 |
Deep Learning in Medical Image Classification from MRI-based Brain Tumor Images |
not yet |
6 |
OmniParser for Pure Vision Based GUI Agent |
not yet |
6 |
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models |
not yet |
5 |
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning |
not yet |
5 |
Can We Leave Deepfake Data Behind in Training Deepfake Detector? |
not yet |
5 |
Safety Layers in Aligned Large Language Models: The Key to LLM Security |
not yet |
5 |
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners |
not yet |
5 |
A Survey on Evaluating Large Language Models in Code Generation Tasks |
not yet |
5 |
FA-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules |
not yet |
5 |
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding |
not yet |
5 |
LeMON: Learning to Learn Multi-Operator Networks |
not yet |
5 |
In-Context Imitation Learning via Next-Token Prediction |
not yet |
5 |
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation |
not yet |
5 |
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering |
not yet |
5 |
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods |
not yet |
5 |
Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning |
not yet |
5 |
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos |
not yet |
5 |
DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting |
not yet |
5 |
PIE: Parkour with Implicit-Explicit Learning Framework for Legged Robots |
not yet |
5 |
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic |
not yet |
5 |
Selective Preference Optimization via Token-Level Reward Function Estimation |
not yet |
5 |
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities |
not yet |
5 |
The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence |
not yet |
5 |
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications |
not yet |
5 |
Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering |
not yet |
5 |
TrackGo: A Flexible and Efficient Method for Controllable Video Generation |
not yet |
5 |
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding |
not yet |
5 |
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments |
not yet |
5 |
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations |
not yet |
5 |
HMoE: Heterogeneous Mixture of Experts for Language Modeling |
not yet |
5 |
AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews |
not yet |
5 |
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model |
not yet |
5 |
Transferring Backdoors between Large Language Models by Knowledge Distillation |
not yet |
5 |
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering |
not yet |
5 |
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges |
not yet |
5 |
A Hassle-free Algorithm for Private Learning in Practice: Don't Use Tree Aggregation, Use BLTs |
not yet |
5 |
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering |
not yet |
5 |
Language Models as Models of Language |
not yet |
5 |
Can LLMs Replace Manual Annotation of Software Engineering Artifacts? |
not yet |
5 |
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective |
not yet |
5 |
MMREC: LLM Based Multi-Modal Recommender System |
not yet |
5 |
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases |
not yet |
5 |
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks |
not yet |
5 |
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey |
not yet |
5 |
Synthesizing Text-to-SQL Data from Weak and Strong LLMs |
 |
5 |
Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators |
not yet |
5 |
Compromising Embodied Agents with Contextual Backdoor Attacks |
not yet |
5 |
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation |
not yet |
5 |
Designing Multi-layered Runtime Guardrails for Foundation Model Based Agents: Swiss Cheese Model for AI Safety by Design |
not yet |
5 |
Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2 |
not yet |
5 |
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability |
not yet |
5 |
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks |
not yet |
5 |
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework |
not yet |
5 |
BioRAG: A RAG-LLM Framework for Biological Question Reasoning |
not yet |
5 |
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities |
not yet |
5 |
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model |
not yet |
5 |
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology |
not yet |
5 |
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions |
not yet |
5 |
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs |
not yet |
4 |
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers |
not yet |
4 |
Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling |
not yet |
4 |
Beyond Preferences in AI Alignment |
not yet |
4 |
Efficient LLM Scheduling by Learning to Rank |
not yet |
4 |
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation |
not yet |
4 |
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models |
not yet |
4 |
OctFusion: Octree-based Diffusion Models for 3D Shape Generation |
not yet |
4 |
Agentic Retrieval-Augmented Generation for Time Series Analysis |
not yet |
4 |
One-layer transformers fail to solve the induction heads task |
not yet |
4 |
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data |
not yet |
4 |
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers |
not yet |
4 |
Segment Any Mesh: Zero-shot Mesh Part Segmentation via Lifting Segment Anything 2 to 3D |
not yet |
4 |
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation |
not yet |
4 |
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler |
not yet |
4 |
Evidential Deep Partial Multi-View Classification With Discount Fusion |
not yet |
4 |
Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias |
not yet |
4 |
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey |
not yet |
4 |
NanoFlow: Towards Optimal Large Language Model Serving Throughput |
not yet |
4 |
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging |
not yet |
4 |
DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions |
not yet |
4 |
Non-Homophilic Graph Pre-Training and Prompt Learning |
not yet |
4 |
MEDCO: Medical Education Copilots Based on A Multi-Agent Framework |
not yet |
4 |
Better Debugging: Combining Static Analysis and LLMs for Explainable Crashing Fault Localization |
not yet |
4 |
A Deconfounding Approach to Climate Model Bias Correction |
not yet |
4 |
Let Community Rules Be Reflected in Online Content Moderation |
not yet |
4 |
How Susceptible are LLMs to Influence in Prompts? |
not yet |
4 |
Hermes 3 Technical Report |
not yet |
4 |
AppAgent v2: Advanced Agent for Flexible Mobile Interactions |
not yet |
4 |
DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework |
not yet |
4 |
Iterative Object Count Optimization for Text-to-image Diffusion Models |
not yet |
4 |
Bidirectional Gated Mamba for Sequential Recommendation |
not yet |
4 |
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation |
not yet |
4 |
HITS: High-coverage LLM-based Unit Test Generation via Method Slicing |
not yet |
4 |
KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? |
not yet |
4 |
GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting |
not yet |
4 |
Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling |
not yet |
4 |
AnyGraph: Graph Foundation Model in the Wild |
not yet |
4 |
Privacy-preserving Universal Adversarial Defense for Black-box Models |
not yet |
4 |
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction |
not yet |
4 |
SoK: Runtime Integrity |
not yet |
4 |
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models |
not yet |
4 |
NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction |
not yet |
4 |
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant |
not yet |
4 |
BLADE: Benchmarking Language Model Agents for Data-Driven Science |
not yet |
4 |
Out-of-distribution generalization via composition: a lens through induction heads in Transformers |
not yet |
4 |
V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models |
not yet |
4 |
LLMJudge: LLMs for Relevance Judgments |
not yet |
4 |
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation |
not yet |
4 |
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models |
not yet |
4 |
Study of MRI-compatible Notched Plastic Ultrasonic Stator with FEM Simulation and Holography Validation |
not yet |
4 |
Activation Space Selectable Kolmogorov-Arnold Networks |
not yet |
4 |
TurboEdit: Instant text-based image editing |
not yet |
4 |
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors |
not yet |
4 |
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding |
not yet |
4 |
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning |
not yet |
4 |
Squeezed states of light after high-harmonic generation in excited atomic systems |
not yet |
4 |
Kolmogorov-Arnold Networks (KAN) for Time Series Classification and Robust Analysis |
not yet |
4 |
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion |
not yet |
4 |
The Design of Autonomous UAV Prototypes for Inspecting Tunnel Construction Environment |
not yet |
4 |
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area |
not yet |
4 |
Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 |
not yet |
4 |
Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation |
not yet |
4 |
Med42-v2: A Suite of Clinical LLMs |
not yet |
4 |
LaWa: Using Latent Space for In-Generation Image Watermarking |
not yet |
4 |
A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems |
not yet |
4 |
Research on Heterogeneous Computation Resource Allocation based on Data-driven Method |
not yet |
4 |
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network |
not yet |
4 |
VACoDe: Visual Augmented Contrastive Decoding |
not yet |
4 |
Revisiting Multi-Modal LLM Evaluation |
not yet |
4 |
Performance Analysis of FAS-Aided NOMA-ISAC: A Backscattering Scenario |
not yet |
4 |
Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles |
not yet |
4 |
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions |
not yet |
4 |
Building Machines that Learn and Think with People |
 |
4 |
Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models |
not yet |
4 |
Achieving Human Level Competitive Robot Table Tennis |
not yet |
4 |
From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems |
not yet |
4 |
Improving LLM-based Unit test generation via Template-based Repair |
not yet |
4 |
500xCompressor: Generalized Prompt Compression for Large Language Models |
not yet |
4 |
Data-Driven Stochastic Closure Modeling via Conditional Diffusion Model and Neural Operator |
not yet |
4 |
Quantum simulation of dynamical gauge theories in periodically driven Rydberg atom arrays |
not yet |
4 |
Operationalizing Contextual Integrity in Privacy-Conscious Assistants |
not yet |
4 |
First search for dark photon dark matter with a MADMAX prototype |
not yet |
4 |
Potential Hessian Ascent: The Sherrington-Kirkpatrick Model |
not yet |
4 |
SpecRover: Code Intent Extraction via LLMs |
not yet |
4 |
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models |
not yet |
4 |
Reinforcement Learning for an Efficient and Effective Malware Investigation during Cyber Incident Response |
not yet |
4 |
Differentiable MadNIS-Lite |
not yet |
4 |
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models |
not yet |
4 |
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement |
not yet |
4 |
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling |
not yet |
4 |
IG-SLAM: Instant Gaussian SLAM |
not yet |
4 |
On the Resilience of Multi-Agent Systems with Malicious Agents |
not yet |
4 |
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention |
not yet |
4 |
Matched Guiding and Controlled Injection in Dark-Current-Free, 10-GeV-Class, Channel-Guided Laser Plasma Accelerators |
not yet |
4 |
AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models |
not yet |
4 |
End-to-End Protocol for High-Quality QAOA Parameters with Few Shots |
not yet |
4 |
SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation |
not yet |
4 |
3D U-KAN Implementation for Multi-modal MRI Brain Tumor Segmentation |
not yet |
4 |
Generative Learning of the Solution of Parametric Partial Differential Equations Using Guided Diffusion Models and Virtual Observations |
not yet |
3 |
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model |
not yet |
3 |
Deep Feature Embedding for Tabular Data |
not yet |
3 |
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning |
not yet |
3 |
Maven: A Multimodal Foundation Model for Supernova Science |
not yet |
3 |
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems |
 |
3 |
Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation |
not yet |
3 |
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge |
not yet |
3 |
Explicit Folded Reed-Solomon and Multiplicity Codes Achieve Relaxed Generalized Singleton Bounds |
not yet |
3 |
Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty |
not yet |
3 |
Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation |
not yet |
3 |
GINN-KAN: Interpretability pipelining with applications in Physics Informed Neural Networks |
not yet |
3 |
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning |
not yet |
3 |
Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild |
not yet |