arXiv:2509.21543 [pdf, ps, other] cs.RO
Plan2Evolve: LLM Self-Evolution for Improved Planning Capability via Automated Domain Generation
Authors: Jinbang Huang, Zhiyuan Li, Zhanguang Zhang, Xingyue Quan, Jianye Hao, Yingxue Zhang
Abstract: Large Language Models (LLMs) have recently shown strong potential in… ▽ More
Submitted 25 September, 2025; originally announced September 2025.
Comments: 25 pages, 7 figures
arXiv:2509.20705 [pdf, ps, other] cs.RO
Building Information Models to Robot-Ready Site Digital Twins (BIM2RDT): An Agentic AI Safety-First Framework
Authors: Reza Akhavian, Mani Amani, Johannes Mootz, Robert Ashe, Behrad Beheshti
Abstract: …sensing, and autonomous field operations can dramatically enhance digital management in the construction industry. This paper introduces BIM2RDT (Building Information Models to Robot-Ready Site Digital Twins), an agentic artificial intelligence (AI) framework designed to… ▽ More
Submitted 24 September, 2025; originally announced September 2025.
arXiv:2509.20070 [pdf, ps, other] cs.RO
LLM Trainer: Automated Robotic Data Generating via Demonstration Augmentation using LLMs
Authors: Abraham George, Amir Barati Farimani
Abstract: We present LLM Trainer, a fully automated pipeline that leverages the world knowledge of Large Language Models (… ▽ More
Submitted 24 September, 2025; originally announced September 2025.
Comments: 9 pages, 5 figures, 4 tables. Submitted to ICRA 2026
arXiv:2509.05338 [pdf, ps, other] cs.RO cs.AI
Plantbot: Integrating Plant and Robot through LLM Modular Agent Networks
Authors: Atsushi Masumori, Norihiro Maruyama, Itsuki Doi, johnsmith, Hiroki Sato, Takashi Ikegami
Abstract: We introduce Plantbot, a hybrid lifeform that connects a living plant with a mobile robot through a network of large language model (… ▽ More
Submitted 1 September, 2025; originally announced September 2025.
arXiv:2509.03842 cs.RO cs.AI
INGRID: Intelligent Generative Robotic Design Using Large Language Models
Authors: Guanglu Jia, Ceng Zhang, Gregory S. Chirikjian
Abstract: The integration of large language models (LLMs) into… ▽ More
Submitted 5 October, 2025; v1 submitted 3 September, 2025; originally announced September 2025.
Comments: We are revising it
arXiv:2509.03212 [pdf, ps, other] cs.CV
AIVA: An AI-based Virtual Companion for Emotion-aware Interaction
Authors: Chenxi Li
Abstract: Recent advances in Large Language Models (LLMs) have significantly improved natural language understanding and generation, enhancing Human-Computer Interaction (HCI). However,… ▽ More
Submitted 3 September, 2025; originally announced September 2025.
arXiv:2509.02998 [pdf, ps, other] cs.CY cs.CR
Integrating Generative AI into Cybersecurity Education: A Study of OCR and Multimodal LLM-assisted Instruction
Authors: Karan Patel, Yu-Zheng Lin, Gaurangi Raul, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam
Abstract: This full paper describes an LLM-assisted instruction integrated with a virtual cybersecurity lab platform. The digital… ▽ More
Submitted 3 September, 2025; originally announced September 2025.
Comments: 9 pages, 3 figures, accepted by IEEE FIE 2025
arXiv:2509.00646 [pdf, ps, other] cs.CY cs.AI
RAG-PRISM: A Personalized, Rapid, and Immersive Skill Mastery Framework with Adaptive Retrieval-Augmented Tutoring
Authors: Gaurangi Raul, Yu-Zheng Lin, Karan Patel, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam
Abstract: The rapid digital transformation of Fourth Industrial Revolution (4IR) systems is reshaping workforce needs, widening skill gaps, especially for older workers. With growing emphasis on STEM skills such as… ▽ More
Submitted 30 August, 2025; originally announced September 2025.
Comments: 9 pages, 5 figures, Accepted by IEEE FIE 2025
arXiv:2509.00098 [pdf] physics.ins-det cond-mat.mtrl-sci
Operating advanced scientific instruments with AI agents that learn on the job
Authors: Aikaterini Vriza, Michael H. Prince, Tao Zhou, Henry Chan, Mathew J. Cherukara
Abstract: …a focus on usability, reproducibility, and intuitive human-instrument interaction. In this work, we explore the integration of agentic AI, powered by Large Language Models (LLMs), as a… ▽ More
Submitted 27 August, 2025; originally announced September 2025.
arXiv:2508.05427 [pdf, ps, other] cs.AI
Large Language Models Transform Organic Synthesis From Reaction Prediction to Automation
Authors: Kartar Kumar Lohana Tharwani, Rajesh Kumar, Sumita, Numan Ahmed, Yong Tang
Abstract: Large language models (LLMs) are beginning to reshape how chemists plan and run reactions in organic synthesis. Trained on millions of reported… ▽ More
Submitted 7 August, 2025; originally announced August 2025.
arXiv:2508.05342 [pdf, ps, other] cs.RO cs.AI
Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control
Authors: Shunlei Li, Longsen Gao, Jin Wang, Chang Che, Xi Xiao, Jiuwen Cao, Yingbai Hu, Hamid Reza Karimi
Abstract: Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm… ▽ More
Submitted 7 August, 2025; originally announced August 2025.
Comments: Journal under review
arXiv:2508.05064 [pdf, ps, other] cs.GR cs.CL cs.CV
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
Authors: Mahmoud Chick Zaouali, Todd Charter, Yehor Karpichev, Brandon Haworth, Homayoun Najjaran
Abstract: Gaussian Splatting has rapidly emerged as a transformative technique for real-time 3D scene representation, offering a highly efficient and expressive alternative to Neural Radiance Fields (NeRF). Its ability to render complex scenes with high fidelity has enabled progress across domains such as scene reconstruction,… ▽ More
Submitted 18 August, 2025; v1 submitted 7 August, 2025; originally announced August 2025.
arXiv:2507.20509 [pdf, ps, other] cs.RO cs.AI eess.SY
LLMs-guided adaptive compensator: Bringing Adaptivity to Automatic Control Systems with Large Language Models
Authors: Zhongchao Zhou, Yuxi Lu, Yaonan Zhu, Yifan Zhao, Bin He, Liang He, Wenwen Yu, Yusuke Iwasawa
Abstract: With rapid advances in code generation, reasoning, and problem-solving, Large Language Models (LLMs) are increasingly applied in… ▽ More
Submitted 28 July, 2025; originally announced July 2025.
arXiv:2507.16068 [pdf, ps, other] cs.RO cs.AI cs.LG cs.MA
Compositional Coordination for Multi-Robot Teams with Large Language Models
Authors: Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme
Abstract: Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in… ▽ More
Submitted 24 July, 2025; v1 submitted 21 July, 2025; originally announced July 2025.
Comments: 9 pages, 4 figures
arXiv:2507.10472 [pdf, ps, other] cs.CL
MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking
Authors: Mohamed T. Younes, Omar Walid, Mai Hassan, Ali Hamdi
Abstract: This paper introduces an innovative Applicant Tracking System (ATS) enhanced by a novel Robotic process automation (RPA) framework or as further referred to as MLAR. Traditional recruitment processes often encounter bottlenecks in resume screening and candidate shortlisting due to time and resource constraints. MLAR addresses these challenges employing Large… ▽ More
Submitted 14 July, 2025; originally announced July 2025.
arXiv:2507.10087 [pdf, ps, other] cs.RO
Foundation Model Driven Robotics: A Comprehensive Review
Authors: Muhammad Tayyab Khan, Ammar Waheed
Abstract: The rapid emergence of foundation models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), has introduced a… ▽ More
Submitted 14 July, 2025; originally announced July 2025.
arXiv:2507.01930 [pdf, ps, other] cs.RO
Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations
Authors: Wenhao Wang, Yanyan Li, Long Jiao, Jiawei Yuan
Abstract: Recent advances in large Language Models (LLMs) have revolutionized mobile… ▽ More
Submitted 22 August, 2025; v1 submitted 2 July, 2025; originally announced July 2025.
Comments: 12 pages, 9 figures
arXiv:2507.01485 [pdf, ps, other] cs.RO cs.AI cs.MA q-bio.QM
BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments
Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun
Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to… ▽ More
Submitted 2 July, 2025; originally announced July 2025.
arXiv:2507.01206 [pdf, ps, other] cs.RO cs.HC
2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration
Authors: Kathy Zhuang, Zixun Huang, Yukun Song, Rui Li, Yinuo Zhou, Allen Y. Yang
Abstract: …poses challenges in machine perception, especially for tasks like 3D object pose estimation in complex, dynamic environments. Our project addresses critical issues in human-robot interaction within mobile AR, focusing on non-intrusive, spatially aware interfaces. We present URSA, an… ▽ More
Submitted 1 July, 2025; originally announced July 2025.
arXiv:2506.22477 [pdf] cs.NI cs.AI cs.ET cs.RO
doi
10.1109/RICAI64321.2024.10911316
Innovative Research on IoT Architecture and Robotic Operating Platforms: Applications of Large Language Models and Generative AI
Authors: Huiwen Han
Abstract: This paper introduces an innovative design for robotic operating platforms, underpinned by a… ▽ More
Submitted 21 June, 2025; originally announced June 2025.
Comments: Published in: 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), IEEE Xplore, DOI: 10.1109/RICAI64321.2024.10911316. \c{opyright} 2024 IEEE
Journal ref: 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), 2024, IEEE Xplore, pp. 881-886
arXiv:2506.21191 [pdf, ps, other] cs.CL cs.SD eess.AS
Prompt-Guided Turn-Taking Prediction
Authors: Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Divesh Lala, Keiko Ochi, Tatsuya Kawahara
Abstract: Turn-taking prediction models are essential components in spoken dialogue systems and conversational robots. Recent approaches leverage… ▽ More
Submitted 3 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.
Comments: This paper has been accepted for presentation at SIGdial Meeting on Discourse and Dialogue 2025 (SIGDIAL 2025) and represents the author's version of the work
arXiv:2506.16473 [pdf, ps, other] cs.HC cs.AI cs.CL
Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support
Authors: Sophie Chiang, Guy Laban, Hatice Gunes
Abstract: …dialogue, it is important to understand how closely their interactions resemble those in traditional therapy settings. This study investigates whether the concerns shared with a robot align with those shared in human-to-human (H2H) therapy sessions, and whether… ▽ More
Submitted 19 June, 2025; originally announced June 2025.
arXiv:2506.13189 [pdf, ps, other] cs.HC cs.RO
Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality
Authors: Yuchong Zhang, Bastian Orthmann, Shichen Ji, Michael Welle, Jonne Van Haastregt, Danica Kragic
Abstract: The integration of robotics and augmented reality (AR) holds… ▽ More
Submitted 16 June, 2025; originally announced June 2025.
Comments: This work has been submitted to the IEEE TVCG for possible publication
arXiv:2506.10106 [pdf, ps, other] cs.RO cs.AI
One For All: LLM-based Heterogeneous Mission Planning in Precision Agriculture
Authors: Marcos Abel Zuzuárregui, Mustafa Melih Toslak, Stefano Carpin
Abstract: Artificial intelligence is transforming precision agriculture, offering farmers new tools to streamline their daily operations. While these technological advances promise increased efficiency, they often introduce additional complexity and steep learning curves that are particularly challenging for non-technical users who must balance tech adoption with exis… ▽ More
Submitted 11 June, 2025; originally announced June 2025.
Comments: Accepted to International Federation of Automatic Control (IFAC) Sensing, Control and Automation Technologies for Agriculture - 8th AGRICONTROL 2025
Journal ref: International Federation of Automatic Control (IFAC) Sensing, Control and Automation Technologies for Agriculture - 8th AGRICONTROL 2025
arXiv:2506.06205 [pdf, other] cs.RO cs.AI
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Authors: Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li , et al. (46 additional authors not shown)
Abstract: Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile… ▽ More
Submitted 6 June, 2025; originally announced June 2025.
Comments: Astra Technical Report
arXiv:2506.02507 [pdf, ps, other] cs.RO
AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Authors: Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong
Abstract: Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Mo… ▽ More
Submitted 23 September, 2025; v1 submitted 3 June, 2025; originally announced June 2025.
arXiv:2505.20733 [pdf] cs.AI
E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing
Authors: Cheonsu Jeong, Seongmin Sim, Hyoyoung Cho, Sungsu Kim, Byounggwan Shin
Abstract: This paper presents an intelligent work automation approach in the context of contemporary digital transformation by integrating generative AI and Intelligent Document Processing (IDP) technologies with an Automation Agent to realize End-to-End (E2E) automation of corporate financial expense processing tasks. While traditional… ▽ More
Submitted 10 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.
arXiv:2505.20658 [pdf, ps, other] cs.CL
Enhancing Transformation from Natural Language to Signal Temporal Logic Using LLMs with Diverse External Knowledge
Authors: Yue Fang, Zhi Jin, Jie An, Hongshen Chen, Xiaohong Chen, Naijun Zhan
Abstract: …Logic (TL), especially Signal Temporal Logic (STL), enables precise formal specification, making it widely used in cyber-physical systems such as autonomous driving and robotics. Automatically… ▽ More
Submitted 24 July, 2025; v1 submitted 26 May, 2025; originally announced May 2025.
Comments: 11 pages, 5 figures, published to ACL 2025
arXiv:2505.03315 [pdf, other] cs.AI
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions
Authors: Kanghyun Jo, Jehwan Choi, Kwanho Kim, Seongmin Kim, Duy-Linh Nguyen, Xuan-Thuy Vo, Adri Priadana, Tien-Dat Tran
Abstract: …and predicting human behavior has emerged as a core capability in various AI application domains such as autonomous driving, smart healthcare, surveillance systems, and social robotics. This paper defines the technical framework of Artificial Behavior Intelligence (ABI), which comprehensively analyzes and interprets human posture, facial expressions, emotion… ▽ More
Submitted 6 May, 2025; originally announced May 2025.
Comments: 9 pages, 6 figures, Pre-print for IWIS2025
arXiv:2504.13201 [pdf, ps, other] cs.CR cs.LG cs.MA
CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
Authors: Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang
Abstract: Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as… ▽ More
Submitted 31 July, 2025; v1 submitted 14 April, 2025; originally announced April 2025.
arXiv:2504.02477 [pdf, ps, other] cs.RO cs.CV
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Authors: Xiaofeng Han, Shunpeng Chen, Zenghuang Fu, Zhe Feng, Lue Fan, Dong An, Changwei Wang, Li Guo, Weiliang Meng, Xiaopeng Zhang, Rongtao Xu, Shibiao Xu
Abstract: Robot vision has greatly benefited from advancements in multimodal fusion techniques and vision-language models (VLMs). We systematically review the applications of multimodal fusion in key… ▽ More
Submitted 15 July, 2025; v1 submitted 3 April, 2025; originally announced April 2025.
Comments: 27 pages, 11 figures, survey paper submitted to Information Fusion
arXiv:2503.19276 [pdf] cs.CV cs.AI
Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications
Authors: Ben Rahman
Abstract: …strides in pixel-level image understanding, yet it remains limited in capturing contextual and semantic relationships between objects. Current models, such as CNN and Transformer-based architectures, excel at identifying pixel-level features but fail to distinguish semantically similar objects (e.g., "doctor" vs. "nurse" in a hospital scene)… ▽ More
Submitted 24 March, 2025; originally announced March 2025.
arXiv:2503.15762 [pdf, other] cs.AI
Dialogic Learning in Child-Robot Interaction: A Hybrid Approach to Personalized Educational Content Generation
Authors: Elena Malnatsky, Shenghui Wang, Koen V. Hindriks, Mike E. U. Ligthart
Abstract: Dialogic learning fosters motivation and deeper understanding in education through purposeful and structured dialogues. Foundational models offer a transformative potential for child-… ▽ More
Submitted 19 March, 2025; originally announced March 2025.
arXiv:2503.09829 [pdf, other] cs.RO cs.LG eess.SY
SE(3)-Equivariant Robot Learning and Control: A Tutorial Survey
Authors: Joohwan Seo, Soochul Yoo, Junwoo Chang, Hyunseok An, Hyunwoo Ryu, Soomi Lee, Arvind Kruthiventy, Jongeun Choi, Roberto Horowitz
Abstract: Recent advances in deep learning and Transformers have driven major breakthroughs in… ▽ More
Submitted 23 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.
Comments: Accepted to International Journcal of Control, Automation and Systems (IJCAS)
arXiv:2503.07885 [pdf, other] cs.RO cs.AI
Safety Guardrails for LLM-Enabled Robots
Authors: Zachary Ravichandran, Alexander Robey, Vijay Kumar, George J. Pappas, Hamed Hassani
Abstract: Although the integration of large language models (LLMs) into… ▽ More
Submitted 10 March, 2025; originally announced March 2025.
arXiv:2503.03889 [pdf, other] cs.RO cs.LG eess.SY
Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line
Authors: Muhammad Waseem, Kshitij Bhatta, Chen Li, Qing Chang
Abstract: The manufacturing industry is undergoing a transformative shift, driven by cutting-edge technologies like 5G, AI, and cloud computing. Despite these advancements, effective system control, which is crucial for optimizing production efficiency, remains a complex challenge due to the intricate, knowledge-dependent nature of manufacturing processes and the reli… ▽ More
Submitted 5 March, 2025; originally announced March 2025.
Comments: 20 pages, 7 figures
arXiv:2502.12498 [pdf, ps, other] cs.RO
USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner
Authors: Mingcong Chen, Siqi Fan, Guanglin Cao, Yun-hui Liu, Hongbin Liu
Abstract: In the era of Large Language Models (LLMs), embodied artificial intelligence presents… ▽ More
Submitted 29 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.
arXiv:2502.09142 [pdf, other] cs.HC cs.RO
LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation
Authors: Yuchong Zhang, Bastian Orthmann, Michael C. Welle, Jonne Van Haastregt, Danica Kragic
Abstract: The integration of robotics and augmented reality (AR) presents… ▽ More
Submitted 13 February, 2025; originally announced February 2025.
Comments: Accepted as conference proceeding in International Conference on Human-Computer Interaction 2025 (HCI International 2025)
arXiv:2502.06419 [pdf, other] cs.RO
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
Authors: Tianshuo Xu, Hao Lu, Xu Yan, Yingjie Cai, Bingbing Liu, Yingcong Chen
Abstract: Large Language Models (LLMs) have made substantial advancements in the field of… ▽ More
Submitted 10 February, 2025; originally announced February 2025.
Comments: Accepted in 2025 IEEE International Conference on Robotics and Automation (ICRA)
arXiv:2502.03945 [pdf, other] cs.CL
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Authors: Mardhiyah Sanni, Tassallah Abdullahi, Devendra D. Kayande, Emmanuel Ayodele, Naome A. Etori, Michael S. Mollel, Moshood Yekini, Chibuzor Okocha, Lukman E. Ismaila, Folafunmi Omofoye, Boluwatife A. Adewale, Tobi Olatunji
Abstract: Speech technologies are transforming interactions across various sectors, from healthcare to call centers and… ▽ More
Submitted 6 February, 2025; originally announced February 2025.
Comments: 19 pages, 5 figures
arXiv:2502.03814 [pdf, ps, other] cs.RO cs.AI
Large Language Models for Multi-Robot Systems: A Survey
Authors: Peihan Li, Zijian An, Shams Abrar, Lifeng Zhou
Abstract: The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-… ▽ More
Submitted 13 September, 2025; v1 submitted 6 February, 2025; originally announced February 2025.
arXiv:2501.19259 [pdf, other] cs.RO cs.CV cs.LG cs.NE eess.SY
Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge
Authors: Amogh Joshi, Sourav Sanyal, Kaushik Roy
Abstract: …into autonomous systems has been limited. Traditional Natural Language Processing (NLP) systems struggle with context and intent understanding, severely restricting human-robot interaction. Recent advancements in Large Language Models (… ▽ More
Submitted 26 April, 2025; v1 submitted 31 January, 2025; originally announced January 2025.
Comments: Accepted for publication at the International Joint Conference on Neural Networks (IJCNN) 2025
arXiv:2501.18733 [pdf, other] cs.RO cs.AI
Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation
Authors: Yuelei Li, Ge Yan, Annabella Macaluso, Mazeyu Ji, Xueyan Zou, Xiaolong Wang
Abstract: The recent advancements in visual reasoning capabilities of large multimodal models (LMMs) and the semantic enrichment of 3D feature fields have expanded the horizons of robotic capabilities. These developments hold significant potential for bridging the gap between high-level reasoning from LMMs and low-level control policies utilizing 3D feature fields. In… ▽ More
Submitted 30 January, 2025; originally announced January 2025.
arXiv:2501.17206 [pdf, other] cs.AI cs.RO
Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care
Authors: Fengpei Yuan, Nehal Hasnaeen, Ran Zhang, Bryce Bible, Joseph Riley Taylor, Hairong Qi, Fenghui Yao, Xiaopeng Zhao
Abstract: This study explores a novel approach to advancing dementia care by integrating socially assistive robotics, reinforcement learning (RL), large language models (… ▽ More
Submitted 28 January, 2025; originally announced January 2025.
Comments: 18 pages, 12 figures
arXiv:2501.15411 [pdf] cs.CY cs.CL
The Potential of Large Language Models in Supply Chain Management: Advancing Decision-Making, Efficiency, and Innovation
Authors: Raha Aghaei, Ali A. Kiaei, Mahnaz Boush, Javad Vahidi, Zeynab Barzegar, Mahan Rofoosheh
Abstract: The integration of large language models (LLMs) into supply chain management (SCM) is revolutionizing the industry by improving decision-making, predictive analytics, and operational efficiency. This white paper explores the… ▽ More
Submitted 26 January, 2025; originally announced January 2025.
arXiv:2501.08944 [pdf, other] cs.MA
Physical AI Agents: Integrating Cognitive Intelligence with Real-World Action
Authors: Fouad Bousetouane
Abstract: …with objects, environments, and dynamic conditions. This need has led to the emergence of Physical AI Agents--systems that integrate cognitive reasoning, powered by specialized LLMs, with precise physical actions to perform real-world tasks. This work introduces Physical AI Agents as an evolution of shared principles with Vertical AI Agents, tailored for p… ▽ More
Submitted 15 January, 2025; originally announced January 2025.
Comments: 27 pages, 3 figures
arXiv:2412.19595 [pdf, other] cs.RO cs.AI
SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms
Authors: Shashank Rao Marpally, Pranav Goyal, Harold Soh
Abstract: …social navigation methods and benchmarks primarily focus on proxemics and task efficiency. While these factors are important, qualitative aspects such as perceptions of a robot's social competence are equally crucial for successful adoption and integration into human environments. We propose a more comprehensive evaluation of social navigation through sc… ▽ More
Submitted 27 December, 2024; originally announced December 2024.
Comments: 7 pages, 5 figures
arXiv:2412.11387 [pdf, other] cs.RO cs.AI
How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach
Authors: Abdulrahman Althobaiti, Angel Ayala, JingYing Gao, Ali Almutairi, Mohammad Deghat, Imran Razzak, Francisco Cruz
Abstract: Large Language Models (LLMs) are… ▽ More
Submitted 15 December, 2024; originally announced December 2024.
arXiv:2412.11026 [pdf, other] cs.CV cs.AI
SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
Authors: Hang Zhang, Zhuoling Li, Jun Liu
Abstract: Dynamic scenes contain intricate spatio-temporal information, crucial for mobile robots, UAVs, and autonomous driving systems to make informed decisions. Parsing these scenes into semantic triplets for accurate Scene Graph Generation (SGG) is highly challenging due to the fluctuating spatio-temporal complexity. Inspired by th… ▽ More
Submitted 6 May, 2025; v1 submitted 14 December, 2024; originally announced December 2024.
Comments: 29 pages, 7 figures
arXiv:2412.10856 [pdf, other] cs.LG cs.PF
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Authors: Wonkyo Choe, Yangfeng Ji, Felix Xiaozhu Lin
Abstract: To deploy LLMs on resource-contained platforms such as mobile… ▽ More
Submitted 31 January, 2025; v1 submitted 14 December, 2024; originally announced December 2024.
arXiv:2412.05515 [pdf, other] cs.RO cs.CV cs.LG
doi
10.3233/FAIA241014
Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning
Authors: Runhao Zeng, Dingjie Zhou, Qiwei Liang, Junlin Liu, Hui Li, Changxin Huang, Jianqiang Li, Xiping Hu, Fuchun Sun
Abstract: Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (… ▽ More
Submitted 6 December, 2024; originally announced December 2024.
Comments: 8 pages, 6 figures, ECAI2024
Journal ref: Proceedings of the 27th European Conference on Artificial Intelligence (ECAI 2024), Santiago de Compostela, Spain, October 19-24, 2024. Frontiers in Artificial Intelligence and Applications, vol. 392, IOS Press, pp. 4369-4376
arXiv:2412.00435 [pdf, other] cs.AI cs.HC cs.RO
Benchmark Real-time Adaptation and Communication Capabilities of Embodied Agent in Collaborative Scenarios
Authors: Shipeng Liu, Boshen Zhang, Zhehui Huang
Abstract: Advancements in Large Language Models (LLMs) have opened… ▽ More
Submitted 30 November, 2024; originally announced December 2024.
Comments: 16 pages, 8 figures
MSC Class: 68T05 ACM Class: I.2.9
arXiv:2411.05107 [pdf, other] cs.RO
MissionGPT: Mission Planner for Mobile Robot based on Robotics Transformer Model
Authors: Vladimir Berman, Artem Bazhenov, Dzmitry Tsetserukou
Abstract: This paper presents a novel approach to building mission planners based on neural networks with Transformer architecture and Large Language Models (… ▽ More
Submitted 7 November, 2024; originally announced November 2024.
arXiv:2410.21418 [pdf, other] cs.AI cs.CL
Large Language Models for Manufacturing
Authors: Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Zhengliang Liu, Zihao Wu, Peng Shu, Jie Tian, Tianze Yang, Shaochen Xu, Yanjun Lyu, Parker Blenk, Jacob Pence, Jason Rupram, Eliza Banu, Ninghao Liu, Linbing Wang, Wenzhan Song, Xiaoming Zhai, Kenan Song, Dajiang Zhu, Beiwen Li, Xianqiao Wang, Tianming Liu
Abstract: The rapid advances in Large Language Models (LLMs) have the potential to… ▽ More
Submitted 28 October, 2024; originally announced October 2024.
arXiv:2410.17602 [pdf, other] cs.RO cs.AI
Integrating Large Language Models for UAV Control in Simulated Environments: A Modular Interaction Approach
Authors: Abhishek Phadke, Alihan Hadimlioglu, Tianxing Chu, Chandra N Sekharan
Abstract: The intersection of LLMs (Large Language Models) and UAV (Unoccupied Aerial Vehicles) technology represents a promising field of research with the potential to enhance UAV capabilities significantly. This study explores the application of… ▽ More
Submitted 23 October, 2024; originally announced October 2024.
arXiv:2410.15863 [pdf, other] cs.RO
Task-oriented Robotic Manipulation with Vision Language Models
Authors: Nurhan Bulus Guran, Hanchi Ren, Jingjing Deng, Xianghua Xie
Abstract: Vision Language Models (VLMs) play a crucial role in robotic manipulation by enabling… ▽ More
Submitted 20 May, 2025; v1 submitted 21 October, 2024; originally announced October 2024.
arXiv:2410.03645 [pdf, other] cs.RO cs.AI cs.CV cs.LG
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
Authors: Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, Lirui Wang
Abstract: Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding… ▽ More
Submitted 4 October, 2024; originally announced October 2024.
Comments: CoRL 2024. Project website: https://gensim2.github.io/
arXiv:2409.19710 [pdf, other] q-bio.NC cs.CL cs.LG cs.SD eess.AS eess.SP q-bio.QM
A multimodal LLM for the non-invasive decoding of spoken text from brain recordings
Authors: Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni
Abstract: …learning. These points make the problem of the non-invasive decoding of text from fMRI recordings very challenging. In this paper, we propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals. The proposed architecture is founded on (i) an encoder derived from a specific… ▽ More
Submitted 29 September, 2024; originally announced September 2024.
Comments: 15 pages, 4 figures
arXiv:2409.16900 [pdf, ps, other] cs.RO cs.AI cs.CL cs.HC
doi
10.3233/FAIA241488
A Roadmap for Embodied and Social Grounding in LLMs
Authors: Sara Incao, Carlo Mazzola, Giulia Belgiovine, Alessandra Sciutti
Abstract: The fusion of Large Language Models (LLMs) and… ▽ More
Submitted 25 September, 2024; originally announced September 2024.
Comments: Accepted Version of a conference paper presented at Robophilosophy Conference 2024
ACM Class: I.2.7; I.2.9; J.4; F.3.2; D.3.1
Journal ref: Incao, S., Mazzola, C., Belgiovine, G., Sciutti, A., 2025, A Roadmap for Embodied and Social Grounding in LLMs. In J. Seibt, P. Fazekas, & O. S. Quick (Eds.), Social Robots with AI: Prospects, Risks, and Responsible Methods, IOS Press
arXiv:2409.13445 [pdf, other] cs.RO cs.CL
Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input
Authors: Dimitrios Panagopoulos, Adolfo Perrusquia, Weisi Guo
Abstract: In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the te… ▽ More
Submitted 20 September, 2024; originally announced September 2024.
Comments: Pre-print version of the accepted paper to appear in IEEE International Conference on Systems, Man and Cybernetics (SMC) 2024
arXiv:2409.02636 [pdf, other] cs.RO eess.SY
Mamba as a motion encoder for robotic imitation learning
Authors: Toshiaki Tsuji
Abstract: Recent advancements in imitation learning, particularly with the integration of LLM techniques, are set to significantly improve… ▽ More
Submitted 25 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.
Comments: 8 pages, 9 figures
arXiv:2408.12093 [pdf, other] cs.RO cs.CV
LLM-enhanced Scene Graph Learning for Household Rearrangement
Authors: Wenhao Li, Zhiyuan Yu, Qijin She, Zhinan Yu, Yuqing Lan, Chenyang Zhu, Ruizhen Hu, Kai Xu
Abstract: …functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which… ▽ More
Submitted 12 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.
Comments: SIGGRAPH ASIA 2024 conference accepted
arXiv:2408.08188 [pdf, other] cs.RO cs.AI cs.LO
Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation
Authors: Shaojun Xu, Xusheng Luo, Yutong Huang, Letian Leng, Ruixuan Liu, Changliu Liu
Abstract: To enable non-experts to specify long-horizon, multi-robot collaborative tasks, language models are increasingly used to translate natural language commands into formal specifications. However, because translation can occur in multiple ways, such translations may lack accuracy or lead to inefficient multi-… ▽ More
Submitted 5 December, 2024; v1 submitted 15 August, 2024; originally announced August 2024.
arXiv:2408.02882 [pdf, other] cs.AI cs.CR cs.LG
Compromising Embodied Agents with Contextual Backdoor Attacks
Authors: Aishan Liu, Yuguang Zhou, Xianglong Liu, Tianyuan Zhang, Siyuan Liang, Jiakai Wang, Yanjun Pu, Tianlin Li, Junqi Zhang, Wenbo Zhou, Qing Guo, Dacheng Tao
Abstract: Large language models (LLMs) have… ▽ More
Submitted 5 August, 2024; originally announced August 2024.
arXiv:2407.14229 [pdf, other] cs.RO cs.AI
Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models
Authors: Dionis Totsila, Quentin Rouxel, Jean-Baptiste Mouret, Serena Ivaldi
Abstract: …multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the… ▽ More
Submitted 9 December, 2024; v1 submitted 19 July, 2024; originally announced July 2024.
arXiv:2407.09890 [pdf, other] cs.RO
Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3)
Authors: Alkesh K. Srivastava, Philip Dames
Abstract: In social robotics, a pivotal focus is enabling… ▽ More
Submitted 26 September, 2024; v1 submitted 13 July, 2024; originally announced July 2024.
Comments: Accepted at the 16th International Conference on Social Robotics + AI
arXiv:2407.04292 [pdf, ps, other] cs.AR cs.RO
doi
10.1145/3695053.3731099
DaDu-Corki: Algorithm-Architecture Co-Design for Embodied AI-powered Robotic Manipulation
Authors: Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan
Abstract: Embodied AI robots have the potential to fundamentally improve the way human beings live and manufacture. Continued progress in the burgeoning field of using large language models to control… ▽ More
Submitted 8 June, 2025; v1 submitted 5 July, 2024; originally announced July 2024.
arXiv:2406.12651 [pdf, other] cs.RO cs.AI cs.CL cs.HC
Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics
Authors: Huan Xu, Jinlin Wu, Guanglin Cao, Zhen Chen, Zhen Lei, Hongbin Liu
Abstract: …diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a novel Ultr… ▽ More
Submitted 18 June, 2024; originally announced June 2024.
Comments: This work has been accepted by MICCAI 2024
arXiv:2405.11286 [pdf, other] cs.CV
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
Authors: Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao
Abstract: …years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving t… ▽ More
Submitted 30 August, 2024; v1 submitted 18 May, 2024; originally announced May 2024.
Comments: Accepted to BMVC 2024
arXiv:2405.03547 [pdf, other] cs.LG cs.AI cs.NE
Position: Leverage Foundational Models for Black-Box Optimization
Authors: Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yujin Tang, Yutian Chen
Abstract: Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning,… ▽ More
Submitted 9 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.
Comments: International Conference on Machine Learning (ICML) 2024
arXiv:2404.13050 [pdf, other] cs.CL cs.AI
FlowMind: Automatic Workflow Generation with LLMs
Authors: Zhen Zeng, William Watson, Nicole Cho, Saba Rahimi, Shayleen Reynolds, Tucker Balch, Manuela Veloso
Abstract: The rapidly evolving field of Robotic Process Automation (RPA) has made significant strides in automating repetitive processes, yet its effectiveness diminishes in scenarios requiring spontaneous or unpredictable tasks demanded by users. This paper introduces a novel approach, FlowMind, leveraging the capabilities of Large Language Models (… ▽ More
Submitted 16 March, 2024; originally announced April 2024.
Comments: Published in ACM ICAIF 2023
arXiv:2404.11267 [pdf, other] cs.RO
Towards Human Awareness in Robot Task Planning with Large Language Models
Authors: Yuchen Liu, Luigi Palmieri, Sebastian Koch, Ilche Georgievski, Marco Aiello
Abstract: The recent breakthroughs in the research on Large Language Models (LLMs) have triggered a… ▽ More
Submitted 17 April, 2024; originally announced April 2024.
Journal ref: Long-Term Human Motion Prediction (LHMP) Workshop of ICRA 2024 (https://motionpredictionicra2024.github.io/)
arXiv:2404.08825 [pdf, other] cs.RO cs.AI
Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents
Authors: Jan-Gerrit Habekost, Connor Gäde, Philipp Allgeuer, Stefan Wermter
Abstract: This paper introduces a novel zero-shot motion planning method that allows users to quickly design smooth robot motions in Cartesian space. A Bézier curve-based Cartesian plan is… ▽ More
Submitted 6 November, 2024; v1 submitted 12 April, 2024; originally announced April 2024.
Comments: Published at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024
arXiv:2404.05134 [pdf, other] cs.RO
doi
10.1109/ICRA57147.2024.10610183
LLM-BT: Performing Robotic Adaptive Tasks based on Large Language Models and Behavior Trees
Authors: Haotian Zhou, Yunhan Lin, Longwu Yan, Jihong Zhu, Huasong Min
Abstract: Large Language Models (LLMs) have been widely utilized to perform complex… ▽ More
Submitted 7 April, 2024; originally announced April 2024.
Comments: 7 pages, 11figures, WILL PUBLISHED ON ICRA 2024
Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)
arXiv:2403.10171 [pdf] cs.AI cs.CV
AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation
Authors: Arkajit Datta, Tushar Verma, Rajat Chawla, Mukunda N. S, Ishaan Bhola
Abstract: In recent advancements within the domain of Large Language Models (LLMs), there has been a notable emergence of agents capable of addressing… ▽ More
Submitted 27 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.
Comments: Accepted in MIPR-2024
arXiv:2402.16664 [pdf, other] cs.IR
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery
Authors: Yuyang Du, Kexin Chen, Yue Zhan, Chang Han Low, Tao You, Mobarakol Islam, Ziyu Guo, Yueming Jin, Guangyong Chen, Pheng-Ann Heng
Abstract: …(VQA) is crucial for promoting surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for various surgeries. However, patient data privacy often restricts the availability of old data when updating the model, neces… ▽ More
Submitted 23 October, 2024; v1 submitted 26 February, 2024; originally announced February 2024.
Comments: This paper has been accapted by 2024 IEEE International Conference on Robotics and Automation (ICRA)
arXiv:2401.11061 [pdf, other] cs.CV cs.AI cs.RO
doi
10.1109/IROS58592.2024.10801790
PhotoBot: Reference-Guided Interactive Photography via Natural Language
Authors: Oliver Limoyo, Jimmy Li, Dmitriy Rivkin, Jonathan Kelly, Gregory Dudek
Abstract: We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to cha… ▽ More
Submitted 25 December, 2024; v1 submitted 19 January, 2024; originally announced January 2024.
Comments: In Proceedings of the IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS'24), Abu Dhabi, UAE, Oct. 14-18, 2024
arXiv:2401.03217 [pdf, other] cs.RO cs.HC
doi
10.1145/3610977.3634966
Understanding Large-Language Model (LLM)-powered Human-Robot Interaction
Authors: Callie Y. Kim, Christine P. Lee, Bilge Mutlu
Abstract: Large-language models (LLMs) hold significant promise in improving human-… ▽ More
Submitted 6 January, 2024; originally announced January 2024.
Comments: 10 pages, 4 figures. Callie Y. Kim and Christine P. Lee contributed equally to the work. To be published in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24), March 11--14, 2024, Boulder, CO, USA
arXiv:2312.09348 [pdf, other] cs.RO
LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems
Authors: Artem Lykov, Maria Dronova, Nikolay Naglov, Mikhail Litvinov, Sergei Satsevich, Artem Bazhenov, Vladimir Berman, Aleksei Shcherbak, Dzmitry Tsetserukou
Abstract: This paper introduces LLM-MARS, first technology that utilizes a Large Language Model based Artificial Intelligence for Multi-Agent… ▽ More
Submitted 14 December, 2023; originally announced December 2023.
Comments: 2023 IEEE. This work has been submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2305.19352
arXiv:2312.07214 [pdf, other] cs.HC cs.AI cs.RO
doi
10.3389/frobt.2024.1347538
Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming
Authors: Younes Lakhnati, Max Pascher, Jens Gerken
Abstract: In a rapidly evolving digital landscape autonomous tools and robots are becoming commonplace. Recognizing the significance of this development, this paper explores the integration of Large Language Models (… ▽ More
Submitted 21 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.
Comments: Frontiers in Robotics and AI, Variable Autonomy for Human-Robot Teaming
Journal ref: Front. Robot. AI 11:1347538 2024
arXiv:2312.00374 [pdf, other] cs.CR
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Authors: Tian Dong, Minhui Xue, Guoxing Chen, Rayne Holland, Yan Meng, Shaofeng Li, Zhen Liu, Haojin Zhu
Abstract: Open-source Large Language Models (LLMs) have recently gained popularity because of their comparable performance to proprietary… ▽ More
Submitted 11 September, 2024; v1 submitted 1 December, 2023; originally announced December 2023.
Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Tian Dong, Minhui Xue, Guoxing Chen, Rayne Holland, Yan Meng, Shaofeng Li, Zhen Liu, Haojin Zhu. The Philosopher's Stone: Trojaning Plugins of Large Language Models. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."
arXiv:2311.10763 [pdf, other] cs.CL cs.LG
Comparing Generalization in Learning with Limited Numbers of Exemplars: Transformer vs. RNN in Attractor Dynamics
Authors: Rui Fukushima, Jun Tani
Abstract: ChatGPT, a widely-recognized large language model (LLM), has recently gained substantial attention for its performance scaling, attributed to the billions of web-sourced natural language sentences used for training. Its underlying architecture,… ▽ More
Submitted 14 November, 2023; originally announced November 2023.
arXiv:2311.07150 [pdf, other] cs.RO cs.AI cs.CL cs.CV
Interaction is all You Need? A Study of Robots Ability to Understand and Execute
Authors: Kushal Koshti, Nidhir Bhavsar
Abstract: This paper aims to address a critical challenge in robotics, which is enabling them to operate seamlessly in human environments through natural language interactions. Our primary focus is to equip… ▽ More
Submitted 13 November, 2023; originally announced November 2023.
arXiv:2311.06640 [pdf, other] cs.RO cs.HC cs.MM
NewsGPT: ChatGPT Integration for Robot-Reporter
Authors: Abdelhadi Hireche, Abdelkader Nasreddine Belkacem, Sadia Jamil, Chao Chen
Abstract: The integration of large language models (LLMs) with social… ▽ More
Submitted 11 November, 2023; originally announced November 2023.
arXiv:2310.04981 [pdf, other] cs.CV cs.LG
Compositional Semantics for Open Vocabulary Spatio-semantic Representations
Authors: Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda
Abstract: General-purpose mobile robots need to complete tasks without exact human instructions. Large language models (… ▽ More
Submitted 7 October, 2023; originally announced October 2023.
Comments: Under review
ACM Class: I.2.10; I.2.9
arXiv:2310.04566 [pdf, other] cs.RO cs.AI cs.CV cs.LG
Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations
Authors: Yuhang Hu, Zhizhuo Zhang, Xinyue Zhu, Ruibo Liu, Philippe Wyder, Hod Lipson
Abstract: …to new objects and environments. Inspired by advancements in natural language processing (NLP), this paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness from demonstrations of well-organized layouts, akin to using conversational datasets to train Large Language Models(… ▽ More
Submitted 15 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.
Comments: This work has been submitted to the IEEE for possible publication
arXiv:2305.19352 [pdf, other] cs.RO
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Authors: Artem Lykov, Dzmitry Tsetserukou
Abstract: This paper presents a novel approach in autonomous robot control, named… ▽ More
Submitted 30 May, 2023; originally announced May 2023.
Comments: 10 pages, 5 figures
arXiv:2304.14844 [pdf, other] cs.RO
doi
10.1007/978-3-031-40725-3_45
Using Large Language Models for Interpreting Autonomous Robots Behaviors
Authors: Miguel A. González-Santamarta, Laura Fernández-Becerra, David Sobrín-Hidalgo, Ángel Manuel Guerrero-Higueras, Irene González, Francisco J. Rodríguez Lera
Abstract: The deployment of autonomous robots in various domains has raised significant concerns about their trustworthiness and accountability. This study explores the potential of Large Language Models (… ▽ More
Submitted 28 April, 2023; originally announced April 2023.
CHatGPT
arxiv でLLM論文のうちRobot Transformerに言及しているものを64個お示しください。
現在、arXiv上で「Robot Transformer」に言及している論文を64件リストアップすることは難しいですが、以下に関連する主要な論文をいくつかご紹介します。
🤖 主な「Robot Transformer」関連論文
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohanらによる論文で、ロボットの実世界での制御におけるTransformerの適用を探求しています。
arXiv https://arxiv.org/abs/2212.06817?utm_source=chatgpt.com
HiRT: Enhancing Robotic Control with Hierarchical Transformers
階層的Transformerを用いてロボット制御の性能向上を目指した研究です。
arXiv https://arxiv.org/html/2410.05273v3?utm_source=chatgpt.com
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Stanford Alpaca 7Bモデルを微調整し、ロボットの行動ツリーを生成するTransformerベースのアプローチを提案しています。
arXiv https://arxiv.org/abs/2305.19352?utm_source=chatgpt.com
LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems
Falcon 7Bモデルを基にしたLLM-MARSは、マルチエージェントロボットシステムにおける行動ツリー生成と対話能力を強化します。
arXiv https://arxiv.org/abs/2312.09348?utm_source=chatgpt.com
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
高レベルの計画と低レベルの制御を分離し、閉ループ状態フィードバックを利用してロボットのタスク計画を強化する新しいアプローチを紹介しています。
arXiv https://arxiv.org/html/2402.08546v2?utm_source=chatgpt.com
これらの論文は、ロボット工学におけるTransformerの応用に関する重要な研究成果を示しています。
さらに詳細な情報や特定のトピックに関する論文をお探しの場合は、arXivの検索機能を利用して、キーワード「Robot Transformer」や「Robotics Transformer」などで検索することをおすすめします。
ご不明な点や特定の分野に関する論文をお探しの場合は、お気軽にお知らせください。