留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

交通大模型综述

肖建力 邱雪 张扬 苏海昇 李志鹏 张传明

肖建力, 邱雪, 张扬, 苏海昇, 李志鹏, 张传明. 交通大模型综述[J]. 交通运输工程学报, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002
引用本文: 肖建力, 邱雪, 张扬, 苏海昇, 李志鹏, 张传明. 交通大模型综述[J]. 交通运输工程学报, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002
XIAO Jian-li, QIU Xue, ZHANG Yang, SU Hai-sheng, LI Zhi-peng, ZHANG Chuan-ming. Review on large language models in transportation[J]. Journal of Traffic and Transportation Engineering, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002
Citation: XIAO Jian-li, QIU Xue, ZHANG Yang, SU Hai-sheng, LI Zhi-peng, ZHANG Chuan-ming. Review on large language models in transportation[J]. Journal of Traffic and Transportation Engineering, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002

交通大模型综述

doi: 10.19818/j.cnki.1671-1637.2025.01.002
基金项目: 

国家自然科学基金项目 92370201

国家自然科学基金项目 61603257

中央高校基本科研业务费专项资金项目 22120230311

详细信息
    作者简介:

    肖建力(1982-),男,湖北天门人,上海理工大学副教授,工学博士,从事人工智能与大数据研究

  • 中图分类号: U491

Review on large language models in transportation

Funds: 

National Natural Science Foundation of China 92370201

National Natural Science Foundation of China 61603257

Fundamental Research Funds for the Central Universities 22120230311

More Information
Article Text (Baidu Translation)
  • 摘要: 对大语言模型(LLMs)技术在交通领域的促进作用进行了深入探讨,展现了其在改善交通管理和控制、提高交通安全以及推动自动驾驶技术发展方面的巨大潜力;系统阐述了LLMs、视觉大模型、多模态大模型的基本概念及发展历程。针对部分交通LLMs,总结了它们的模型架构和训练方法,探讨了LLMs在交通领域,如交通管理和控制、交通安全和自动驾驶方面的主要应用。研究结果表明:在交通管理和控制方面,LLMs的应用可改善交通信号控制和交通状态预测等问题,并为城市交通管理带来新的可能性,这不仅能减少交通拥堵,还降低环境污染;在交通安全方面,相比于传统模型,LLMs的应用显著提高交通事故分析和预测能力,通过对历史事故数据的深入学习,模型能够识别出事故高发区域和时段,从而采取预防措施,提高交通安全指数;在自动驾驶领域,传统模型向多模态自动驾驶模型的转变不仅能提高自动驾驶系统的决策和环境适应能力,还会为用户提供更加安全、舒适的驾驶体验。本文挖掘了LLMs在当今交通领域中的潜力和价值,同时为实现更加智能、高效的交通系统提供了有用的建议,如降低交通LLMs的计算成本,提升模型的实时性和可靠性等。

     

  • 图  1  大模型的发展历程

    Figure  1.  Development history of LLMs

    图  2  多模态大模型的任务类型

    Figure  2.  Task types of large multimodal models

    图  3  MiniGPT-4的模型架构

    Figure  3.  Model architecture of MiniGPT-4

    图  4  UrbanGPT模型整体架构

    Figure  4.  Overall architecture of UrbanGPT model

    图  5  UniST的模型架构

    Figure  5.  Architecture of UniST model

    图  6  大模型在交通管理和控制领域的应用

    Figure  6.  Applications of LLMs in traffic management and control

    图  7  大模型在交通安全领域的应用

    Figure  7.  Applications of LLMs in traffic safety

    图  8  大模型在自动驾驶领域的应用

    Figure  8.  Applications of LLMs in autonomous driving

    表  1  TransGPT模型的下载链接

    Table  1.   Download links for TransGPT models

    模型 下载链接
    TransGPT-7B-v0 https://huggingface.co/DUOMO-Lab/TransGPT-v0
    TransGPT-MM-6B-v0 https://huggingface.co/DUOMO-Lab/TransGPT-MM-v0
    TransGPT-MM-6B-v1 https://huggingface.co/DUOMO-Lab/TransGPT-MM-v1
    下载: 导出CSV

    表  2  交通领域内的常见数据集

    Table  2.   Common datasets in transportation

    数据集 简介 链接
    nuScenes[73] 收集了波士顿和新加坡近1 000个复杂的驾驶场景。数据集由140万张
    图像、39万次激光雷达扫描和140万个3D人工注释边界框组成
    https://nuscenes.org/nuscenes
    Mapillary Vistas
    Dataset[74]
    一个大规模街道级图像数据集,包含2.5万个高分辨率图像,有66个对象类别,另有37个类别特定于实例的标签 https://www.mapillary.com/dataset/vistas
    ApolloCar3D[75] 包含5 277个驾驶图像和超过6万的汽车实例,其中每辆汽车都配备了具有绝对模型尺寸和语义标记关键点的行业级3D CAD模型 https://apolloscape.auto/car_instance.html
    BBD 100K 由10万个视频和各种注释组成,包括图像级别标记,对象边界框,可行驶区域,车道标记和全帧实例分割。该数据集具有地理、环境和天气多样性 http://bdd-data.berkeley.edu/
    The SYNTHIA
    Dataset[76]
    由13个类别精确的像素级语义注释:天空、建筑、道路、人行道、围栏、植被、杆、汽车、标志、行人、骑自行车的人、车道标记 https://synthia-dataset.net/
    KUL Belgium Traffic
    Sign Dataset
    包含数千个不同的交通标志,1万多个交通标志注释。使用8个高分辨率摄像头录制的4个视频序列安装在1辆面包车上,录制时间总计超过3 h https://btsd.ethz.ch/shareddata/
    Bosch Small Traffic
    Lights Dataset[77]
    包含13 427个分辨率为1 280像素×720像素的摄像机图像,并包含约2.4万个带注释的交通信号灯。其中注释包括交通信号灯的边界框以及每个交通信号灯的当前状态 https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset
    GTSRB[78] 德国交通标志基准测试数据集,其中有超过40个类别,一共超过5万张图像 https://benchmark.ini.rub.de/gtsrb_dataset.html
    Tsinghua-Tencent
    100K[79]
    一个大型交通标志基准数据集,有超过10万张图像,包含了3万个交通标志,这些图像涵盖了照明度和天气变换的差异 https://cg.cs.tsinghua.edu.cn/traffic-sign/
    MS COCO[80] 一个大型的物体检测、分割数据集。以场景理解为目标,通过截取复杂的日常场景,然后进行精确分割并标定位置。图像包括91个类别,32.8万个影像和250万个标签 https://cocodataset.org/
    UA-DETRAC[81] 一个多目标检测和多目标跟踪基准。其中超过14万个帧,标注了8 250个车辆和121万个标记的对象边界框 https://www.kaggle.com/datasets/dtrnngc/ua-detrac-dataset
    BoxCars[82] 包括11.6万张车辆图像。这些图像由多个监控摄像头拍摄,且来自于多个观察点 https://github.com/JakubSochor/BoxCars
    下载: 导出CSV
  • [1] DIMITRAKOPOULOS G, DEMESTICHAS P. Intelligent transportation systems[J]. IEEE Vehicular Technology Magazine, 2010, 5(1): 77-84.
    [2] LIN Yang-xin, WANG Ping, MA Meng. Intelligent transportation system (ITS): concept, challenge and opportunity[C]//IEEE. 2017 IEEE 3rd International Conference on Big Data Security on Cloud (Bigdatasecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). New York: IEEE, 2017: 167-172.
    [3] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//ACM. Advances in Neural Information Processing Systems. New York: ACM, 2022: 27730-27744.
    [4] ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[R]. San Francisco: OpenAI, 2023.
    [5] ZHANG Si-yao, FU Dao-cheng, LIANG Wen-zhe, et al. TrafficGPT: viewing, processing and interacting with traffic foundation models[J]. Transport Policy, 2024, 150: 95-105.
    [6] ZHOU Xing-cheng, LIU Ming-yu, YURTSEVER E, et al. Vision language models in autonomous driving and intelligent transportation systems[J]. arXiv, 2023, DOI: 10.48550/arXiv.2310.14414.
    [7] SHOAIB M R, EMARA H M, ZHAO Jun. A survey on the applications of frontier AI, foundation models, and large language models to intelligent transportation systems[C]//IEEE. 2023 International Conference on Computer and Applications (ICCA). New York: IEEE, 2023: 1-7.
    [8] CUI Can, MA Yun-sheng, CAO Xue, et al. A survey on multimodal large language models for autonomous driving[C]//IEEE. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE, 2024: 958-979.
    [9] ZHENG Ou, ABDEL-ATY M, WANG Dong-dong, et al. ChatGPT is on the horizon: could a large language model be suitable for intelligent traffic safety research and applications?[J]. arXiv, 2023, DOI: 10.48550/arXiv.2303.05382.
    [10] CUI Can, MA Yun-sheng, CAO Xu, et al. Receive, reason, and react: drive as you say, with large language models in autonomous vehicles[J]. IEEE Intelligent Transportation Systems Magazine, 2024, 16(4): 81-94. doi: 10.1109/MITS.2024.3381793
    [11] MCCORDUCK P, CFE C. Machines Who Think: a Personal Inquiry into the History and Prospects of Artificial Intelligence[M]. Natick: A.K. Peters, 2004.
    [12] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. doi: 10.1109/5.726791
    [13] CRESWELL A, WHITE T, DUMOULIN V, et al. Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35(1): 53-65. doi: 10.1109/MSP.2017.2765202
    [14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//MIT Press. Proceedings of the 31st International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2017: 6000-6010.
    [15] DEVLIN J, CHANG Ming-wei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv, 2018, DOI: 10.48550/arXiv.1810.04805.
    [16] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[R]. San Francisco: OpenAI, 2018.
    [17] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//ACM. Advances in Neural Information Processing Systems. New York: ACM, 2020: 1877-1901.
    [18] CHOWDHERY A, NARANG S, DEVLIN J, et al. PaLM: scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240): 11342-11436.
    [19] TAYLOR R, KARDAS M, CUCURULL G, et al. Galactica: a large language model for science[J]. arXiv, 2022, DOI: 10.48550/arXiv.2211.09085.
    [20] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2302.13971.
    [21] WEI J, TAY Y, BOMMASANI R, et al. Emergent abilities of large language models[J]. arXiv, 2022, DOI: 10.48550/arXiv.2206.07682.
    [22] SANH V, WEBSON A, RAFFEL C, et al. Multitask prompted training enables zero-shot task generalization[J]. arXiv, 2021, DOI: 10.48550/arXiv.2110.08207.
    [23] WEI J, WANG Xue-zhi, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[J]. arXiv, 2022, DOI: 10.48550/arXiv.2201.11903.
    [24] TAO Chao-fan, LIU Qian, DOU Long-xu. Scaling laws with vocabulary: larger models deserve larger vocabularies[C]// MIT Press. Proceedings of the 38th International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2024: 1-33.
    [25] HOFFMANN J, BORGEAUD S, MENSCH A, et al. Training compute-optimal large language models[J]. arXiv, 2022, DOI: 10.48550/arXiv.2203.15556.
    [26] ZHAO W X, ZHOU Kun, LI Jun-yi, et al. A survey of large language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2303.18223.
    [27] OQUAB M, DARCET T, MOUTAKANNI T, et al. DINOv2: learning robust visual features without supervision[J]. arXiv, 2023, DOI: 10.48550/arXiv.2304.07193.
    [28] CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision transformers[C]//IEEE. 2021 IEEE/CVF International Conference on Computer Vision(ICCV). New York: IEEE, 2021: 9630-9640.
    [29] ZHOU Jing-hao, WEI Chen, WANG Hui-yu, et al. iBOT: image BERT pre-training with online tokenizer[J]. arXiv, 2021, DOI: 10.48550/arXiv.2111.07832.
    [30] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//PMLR. International Conference on Machine Learning. New York: PMLR, 2021: 8748-8763.
    [31] TUO Yu-xiang, XIANG Wang-meng, HE Jun-yan, et al. AnyText: multilingual visual text generation and editing[J]. arXiv, 2023, DOI: 10.48550/arXiv.2311.03054.
    [32] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//MIT Press. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2020: 6840-6851.
    [33] MA Jian, ZHAO Ming-jun, CHEN Chen, et al. GlyphDraw: seamlessly rendering text with intricate spatial structures in text-to-image Generation[J]. arXiv, 2023, DOI: 10.48550/arXiv.2303.17870.
    [34] CHEN Jing-ye, HUANG Yu-pan, LYU Teng-chao, et al. Textdiffuser: diffusion models as text painters[C]//MIT Press. Proceedings of the 37th International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2023: 1-35.
    [35] JIANG Yu-ming, WU Tian-xing, YANG Shuai, et al. Videobooth: diffusion-based video generation with image prompts[C]//IEEE. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2024: 6689-6700.
    [36] RUIZ N, LI Yuan-zhen, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]//IEEE. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2023: 22500-22510.
    [37] WANG Yi, LI Kun-chang, LI Xin-hao, et al. Computer Vision-ECCV 2024[M]. Berlin: Springer International Publishing, 2024.
    [38] ZHAO Long, GUNDAVARAPU N B, YUAN Liang-zhe, et al. Videoprism: a foundational visual encoder for video understanding[J]. arXiv, 2024, DOI: 10.48550/arXiv.2402.13217.
    [39] LIU Ye, LI Si-yuan, WU Yang, et al. Umt: unified multi-modal transformers for joint video moment retrieval and highlight detection[C]//IEEE. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2022: 3032-3041.
    [40] ZHU De-yao, CHEN Jun, SHEN Xiao-qian, et al. MiniGPT-4: enhancing vision-language understanding with advanced large language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2304.10592.
    [41] CHIANG W L, LI Z, LIN Z, et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality[R/OL]. 2023, https://vicuna.lmsys.org.
    [42] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv, 2020, DOI: 10.48550/arXiv.2010.11929.
    [43] SHARMA P, DING Nan, GOODMAN S, et al. Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning[C]//USAACL. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Stroudsburg: USAACL, 2018: 2556-2565.
    [44] ORDONEZ V, KULKARNI G, BERG T. Im2text: describing images using 1 million captioned photographs[C]//ACM. Proceedings of the 25th International Conference on Neural Information Processing Systems. New York: ACM, 2011: 1143-1151.
    [45] SCHUHMANN C, VENCU R, BEAUMONT R, et al. Laion-400m: open dataset of CLIP-filtered 400 million image-text pairs[J]. arXiv, 2021, DOI: 10.48550/arXiv.2111.02114.
    [46] ZANG Yu-hang, LI Wei, HAN Jun, et al. Contextual object detection with multimodal large language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2305.18279.
    [47] CARION N, MASSA F, SYNNAEVE G, et al. Computer Vision-ECCV 2020[M]. Berlin: Springer International Publishing, 2020.
    [48] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//IEEE. 2017 IEEE International Conference on Computer Vision(ICCV). New York: IEEE, 2017: 2980-2988.
    [49] YANG Zheng-yuan, LI Lin-jie, LIN K, et al. The dawn of LMMs: preliminary explorations with GPT-4V(ision)[J]. arXiv, 2023, DOI: 10.48550/arXiv.2309.17421.
    [50] ANIL R, BORGEAUD S, ALAYRAC J B, et al. Gemini: a family of highly capable multimodal models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2312.11805.
    [51] HENDRYCKS D, BURNS C, BASART S, et al. Measuring massive multitask language understanding[J]. arXiv preprint, 2020, DOI: 10.48550/arXiv.2009.03300.
    [52] DONG Xiao-yi, ZHANG Pan, ZANG Yu-hang, et al. InternLM-XComposer2-4KHD: a pioneering large vision-language model handling resolutions from 336 pixels to 4KHD[J]. arXiv, 2024, DOI: 10.48550/arXiv.2404.06512.
    [53] MATHEW M, KARATZAS D, JAWAHAR C V. DocVQA: a dataset for VQA on document images[C]//IEEE. 2021 IEEE Winter Conference on Applications of Computer Vision(WACV). New York: IEEE, 2021: 2200-2209.
    [54] MASRY A, LONG Do X, TAN J Q, et al. ChartQA: a benchmark for question answering about charts with visual and logical reasoning[C]//USAACL. Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg: USAACL, 2022: 2263-2279.
    [55] SINGH A, NATARAJAN V, SHAH M, et al. Towards VQA models that can read[C]//IEEE. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8317-8326.
    [56] ROHRBACH A, HENDRICKS L A, BURNS K, et al. Object hallucination in image captioning[J]. arXiv, 2018, DOI: 10.48550/arXiv.1809.02156.
    [57] LIU Yu-liang, LI Zhang, HUANG Ming-xin, et al. OCRBench: on the hidden mystery of OCR in large multimodal models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2305.07895.
    [58] CONTRIBUTORS O C. Opencompass: a universal evaluation platform for foundation models[R]. GitHub Repository, 2023.
    [59] BAI Jin-ze, BAI Shuai, YANG Shu-sheng, et al. Qwen-VL: a versatile vision-language model with versatile abilities[J]. arXiv, 2023, DOI: 10.48550/arXiv.2308.12966.
    [60] WANG Wei-han, LYU Qing-song, YU Wen-meng, et al. CogVLM: visual expert for pretrained language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2311.03079.
    [61] YOUNG A, CHEN Bei, LI Chao, et al. Yi: open foundation models by 01. ai[J]. arXiv, 2024, DOI: 10.48550/arXiv.2403.04652.
    [62] WANG Peng, WEI Xiang, HU Fang-xu, et al. TransGPT: multi-modal generative pre-trained transformer for transportation[C]//IEEE. 2024 International Conference on Computational Linguistics and Natural Language Processing (CLNLP). New York: IEEE, 2024: 96-100.
    [63] DU Zheng-xiao, QIAN Yu-jie, LIU Xiao, et al. GLM: general language model pretraining with autoregressive blank infilling[C]//ACL. Proceedings of the 60th Annual Meeting of the Association for Computational linguistics. Stroudsburg: ACL, 2022: 320-335.
    [64] LI Zhong-hang, XIA Liang-hao, TANG Jia-bin, et al. UrbanGPT: spatio-temporal large language models[J]. arXiv, 2024, DOI: 10.48550/arXiv.2403.00813.
    [65] 关为生, 肖建力. 联合时空特征的交通流参数预测综述[J]. 上海理工大学学报, 2022, 44(6): 592-602.

    GUAN Wei-sheng, XIAO Jian-li. A review on parameters prediction of traffic flow by combining spatio-temporal features[J]. Journal of University of Shanghai for Science and Technology, 2022, 44(6): 592-602.
    [66] 龙佰超, 关为生, 肖建力. 基于数据编解码的时空交通流预测方法[J]. 上海理工大学学报, 2023, 45(2): 120-127.

    LONG Bai-chao, GUAN Wei-sheng, XIAO Jian-li. Spatio-temporal traffic flow prediction method based on data encoding and decoding[J]. Journal of University of Shanghai for Science and Technology, 2023, 45(2): 120-127.
    [67] YU Bing, YIN Hao-teng, ZHU Zhan-xing. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting[J]. arXiv, 2017, DOI: 10.48550/arXiv.1709.04875.
    [68] BAI Lei, YAO Li-na, LI Can, et al. Adaptive graph convolutional recurrent network for traffic forecasting[C]//ACM. Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17804-17815.
    [69] YUAN Yuan, DING Jing-tao, FENG Jie, et al. UniST: a prompt-empowered universal model for urban spatio-temporal prediction[C]//ACM. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2024: 4095-4106.
    [70] ZHANG Jun-bo, ZHENG Yu, QI De-kang. Deep spatio-temporal residual networks for citywide crowd flows prediction[C]//ACM. Proceedings of the AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 1655-1661.
    [71] LIU Ling-bo, ZHANG Rui-mao, PENG Jie-feng, et al. Attentive crowd flow machines[C]//ACM. Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1553-1561.
    [72] JIN K H, WI J A, LEE E J, et al. TrafficBERT: pre-trained model with large-scale data for long-range traffic flow forecasting[J]. Expert Systems with Applications, 2021, 186: 115738. doi: 10.1016/j.eswa.2021.115738
    [73] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: a multimodal dataset for autonomous driving[C]//IEEE. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2020: 11621-11631.
    [74] NEUHOLD G, OLLMANN T, BULÒ S R, et al. The mapillary vistas dataset for semantic understanding of street scenes[C]//IEEE. 2017 IEEE International Conference on Computer Vision(ICCV). New York: IEEE, 2017: 5000-5009.
    [75] SONG Xi-bin, WANG Peng, ZHOU Ding-fu, et al. ApolloCar3D: a large 3D car instance understanding benchmark for autonomous driving[C]//IEEE. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2019: 5447-5457.
    [76] ROS G, SELLART L, MATERZYNSKA J, et al. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes[C]//IEEE. 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3234-3243.
    [77] BEHRENDT K, NOVAK L, BOTROS R. A deep learning approach to traffic lights: detection, tracking, and classification[C]//IEEE. 2017 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2017: 1370-1377.
    [78] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. The German traffic sign recognition benchmark: a multi-class classification competition[C]//IEEE. The 2011 International Joint Conference on Neural Networks. New York: IEEE, 2011: 1453-1460.
    [79] ZHU Zhe, LIANG Dun, ZHANG Song-hai, et al. Traffic-sign detection and classification in the wild[C]//IEEE. 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2110-2118.
    [80] LIN T Y, MAIRE M, BELONGIE S, et al. Computer Vision-ECCV 2014[M]. Berlin: Springer International Publishing, 2014.
    [81] WEN Long-yin, DU Da-wei, CAI Zhao-wei, et al. UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking[J]. Computer Vision and Image Understanding, 2020, 193: 102907. doi: 10.1016/j.cviu.2020.102907
    [82] SOCHOR J, HEROUT A, HAVEL J. BoxCars: 3D boxes as CNN input for improved fine-grained vehicle recognition[C]//IEEE. 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3006-3015.
    [83] DA Long-chao, GAO Min-quan, MEI Hao, et al. Prompt to transfer: sim-to-real transfer for traffic signal control with prompt learning[J]. arXiv, 2023, DOI: 10.48550/arXiv.2308.14284.
    [84] 秦严严, 罗钦中, 贺正冰. 网联自动驾驶车辆混合交通流专用道管控方法[J]. 交通运输工程学报, 2023, 23(3): 221-231. doi: 10.19818/j.cnki.1671-1637.2023.03.017

    QIN Yan-yan, LUO Qin-zhong, HE Zheng-bing. Management and control method of dedicated lanes for mixed traffic flows with connected and automated vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(3): 221-231 doi: 10.19818/j.cnki.1671-1637.2023.03.017
    [85] HANNA J, STONE P. Grounded action transformation for robot learning in simulation[C]//ACM. Proceedings of the AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 4931-4932.
    [86] LAI Si-qi, XU Zhao, ZHANG Wei-jia, et al. Large language models as traffic signal control agents: capacity and opportunity[J]. arXiv, 2023, DOI: 10.48550/arXiv.2312.16044.
    [87] LIU Chen-xi, YANG Sun, XU Qian-xiong, et al. Spatial-temporal large language model for traffic prediction[J]. arXiv, 2024, DOI: 10.48550/arXiv.2401.10134.
    [88] 户佐安, 邓锦程, 韩金丽, 等. 图神经网络在交通预测中的应用综述[J]. 交通运输工程学报, 2023(5): 39-61. doi: 10.19818/j.cnki.1671-1637.2023.05.003

    HU Zuo-an, DENG Jin-cheng, HAN Jin-li, et al. Review on application of graph neural network in traffic prediction[J]. Journal of Traffic and Transportation Engineering, 2023(5): 39-61. doi: 10.19818/j.cnki.1671-1637.2023.05.003
    [89] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. (2020-09-18)[2024-12-01], https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf.
    [90] TOUVRON H, MARTIN L, STONE K, et al. LLaMA 2: open foundation and fine-tuned chat models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2307.09288.
    [91] ZHOU Jie, CUI Gan-qu, HU Sheng-ding, et al. Graph neural networks: a review of methods and applications[J]. AI Open, 2020, 1: 57-81. doi: 10.1016/j.aiopen.2021.01.001
    [92] 吴兵, 王文璇, 李林波, 等. 多前车影响的智能网联车辆纵向控制模型[J]. 交通运输工程学报, 2020, 20(2): 184-194. doi: 10.19818/j.cnki.1671-1637.2020.02.015

    WU Bing, WANG Wen-xuan, LI Lin-bo, et al. Longitudinal control model for connected autonomous vehicles influenced by multiple preceding vehicles[J]. Journal of Traffic and Transportation Engineering, 2020, 20(2): 184-194. doi: 10.19818/j.cnki.1671-1637.2020.02.015
    [93] ZHENG O, ABDEL-ATY M, WANG D D, et al. TrafficSafetyGPT: tuning a pre-trained large language model to a domain-specific expert in transportation safety[J]. arXiv, 2023, DOI: 10.48550/arXiv.2307.15311.
    [94] JONGWIRIYANURAK N, ZENG Z C, WANG M H, et al. Framework for motorcycle risk assessment using onboard panoramic camera (short paper)[C]//Roger B, Dianna S, Sarah W, et al. Leibniz International Proceedings in Informatics (LIPIcs). Leeds: Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2023: 44: 1-44: 7.
    [95] LIU Hao-tian, LI Chun-yuan, WU Qing-yang, et al. Visual instruction tuning[C]// MIT Press. Proceedings of the 37th International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2023: 1-25.
    [96] ARTEAGA C, PARK J W. A large language model framework to uncover underreporting in traffic crashes[J]. Journal of Safety Research, 2023, 92: 1-13.
    [97] TAY Y, DEHGHANI M, TRAN V Q, et al. UL2: unifying language learning paradigms[J]. arXiv, 2022, DOI: 10.48550/arXiv.2205.05131.
    [98] WANG Le-ning, REN Yi-long, JIANG Han, et al. AccidentGPT: accident analysis and prevention from V2X environmental perception with multi-modal large model[J]. arXiv, 2023, DOI: 10.48550/arXiv.2312.13156.
    [99] XU Run-sheng, TU Zheng-zhong, XIANG Hao, et al. CoBEVT: cooperative bird's eye view semantic segmentation with sparse transformers[J]. arXiv, 2022, DOI: 10.48550/arXiv.2207.02202.
    [100] MEHR E, JOURDAN A, THOME N, et al. DiscoNet: shapes learning on disconnected manifolds for 3D editing[C]//IEEE. Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 3473-3482.
    [101] XU Run-sheng, XIANG Hao, TU Zheng-zhong, et al. Computer vision - ECCV 2022[M]. Berlin: Springer International Publishing, 2022.
    [102] 王润民, 朱宇, 赵祥模, 等. 自动驾驶测试场景研究进展[J]. 交通运输工程学报, 2021, 21(2): 21-37. doi: 10.19818/j.cnki.1671-1637.2021.02.003

    WANG Run-min, ZHU Yu, ZHAO Xiang-mo, et al. Research progress on test scenario of autonomous driving[J]. Journal of Traffic and Transportation Engineering, 2021, 21(2): 21-37. doi: 10.19818/j.cnki.1671-1637.2021.02.003
    [103] WANG Xiao-feng, ZHU Zheng, HUANG Guan, et al. DriveDreamer: towards real-world-driven world models for autonomous driving[J]. arXiv, 2023, DOI: 10.48550/arXiv.2309.09777.
    [104] KIM S W, PHILION J, TORRALBA A, et al. DriveGAN: towards a controllable high-quality neural simulation[C]//IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 5816-5825.
    [105] JIN Y, SHEN X, PENG H, et al. SurrealDriver: designing generative driver agent simulation framework in urban contexts based on large language model[J]. arXiv, 2023, DOI: 10.48550/arXiv.2309.13193.
    [106] WEN Li-cheng, FU Dao-cheng, LI Xin, et al. DiLu: a knowledge-driven approach to autonomous driving with large language models[J]. arXiv, 2023, DOI: 10.48550/arXiv.2309.16292.
    [107] XI Z, SUKTHANKAR G. A graph representation for autonomous driving[C]//MIT Press. Proceedings of the 36th International Conference on Neural Information Processing Systems. Massachusetts: MIT Press, 2022: 1-11.
    [108] XU Zhen-hua, ZHANG Yu-jia, XIE En-ze, et al. DriveGPT4: interpretable end-to-end autonomous driving via large language model[J]. IEEE Robotics and Automation Letters, 2004, 9: 8186-8193.
    [109] JIN Bu, LIU Xin-yu, ZHENG Yu-peng, et al. ADAPT: action-aware driving caption transformer[C]//IEEE. 2023 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2023: 7554-7561.
    [110] MAO Jia-geng, QIAN Yu-xi, YE Jun-jie, et al. GPT-Driver: learning to drive with GPT[J]. arXiv, 2023, DOI: 10.48550/arXiv.2310.01415.
    [111] WANG Yu-qi, HE Jia-wei, FAN Lue, et al. Driving into the future: multiview visual forecasting and planning with world model for autonomous driving[J]. arXiv, 2023, DOI: 10.48550/arXiv.2311.17918.
    [112] JIANG Bo, CHEN Shao-yu, XU Qing, et al. VAD: vectorized scene representation for efficient autonomous driving[C]//IEEE. 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2023: 8340-8350.
    [113] GAO Rui-yuan, CHEN Kai, XIE En-ze, et al. Magicdrive: street view generation with diverse 3D geometry control[J]. arXiv, 2023, DOI: 10.48550/arXiv.2310.02601.
    [114] YANG Kai-rui, MA En-hui, PENG Ji-bin, et al. BEVControl: accurately controlling street-view elements with multi-perspective consistency via bev sketch layout[J]. arXiv, 2023, DOI: 10.48550/arXiv.2308.01661.
    [115] MA Yun-sheng, CUI Can, CAO Xu, et al. LaMPilot: an open benchmark dataset for autonomous driving with language model programs[C]//IEEE. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2024: 15141-15151.
    [116] TREIBER M, HENNECKE A, HELBING D. Congested traffic states in empirical observations and microscopic simulations[J]. Scientific Reports, 2000, 62(2): 1805-1824.
    [117] KESTING A, TREIBER M, HELBING D. General lane-changing model MOBIL for car-following models[J]. Transportation Research Record, 2007, 1999(1): 86-94.
    [118] SHAO Hao, HU Yu-xuan, WANG Le-tian, et al. LMDrive: closed-loop end-to-end driving with large language models[C]//IEEE. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2024: 15120-15130.
    [119] WANG Wen-hai, XIE Jiang-wei, HU Chuan-yang, et al. DriveMLM: aligning multi-modal large language models with behavioral planning states for autonomous driving[J]. arXiv, 2023, DOI: 10.48550/arXiv.2312.09245.
    [120] FAN Hao-yang, ZHU Fan, LIU Chang-chun, et al. Baidu apollo em motion planner[J]. arXiv, 2018, DOI: 10.48550/arXiv.1807.08048.
    [121] SHAO Hao, WANG Le-tian, CHEN Ruo-bing, et al. Safety-enhanced autonomous driving using interpretable sensor fusion transformer[C]//PMLR. Conference on Robot Learning. New York: PMLR, 2023: 726-737.
    [122] SIMA Chong-hao, RENZ K, CHITTA K, et al. DriveLM: driving with graph visual question answering[J]. arXiv, 2023, DOI: 10.48550/arXiv.2312.14150.
    [123] HU Yi-han, YANG Jia-zhi, CHEN Li, et al. Planning-oriented autonomous driving[C]//IEEE. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2023: 17853-17862.
    [124] HAN Bing-ye, DU Zeng-ming, DAI Lei, et al. Modeling the dynamic performance of transportation infrastructure using panel data model in state-space specifications[J]. Journal of Traffic and Transportation Engineering (English Edition), 2023, 10(3): 441-453.
    [125] OLAYODE O I, DU B, SEVERINO A, et al. Systematic literature review on the applications, impacts, and public perceptions of autonomous vehicles in road transportation system[J]. Journal of Traffic and Transportation Engineering (English Edition), 2023, 10(6): 1037-1060.
  • 加载中
图(8) / 表(2)
计量
  • 文章访问数:  88
  • HTML全文浏览量:  79
  • PDF下载量:  20
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-07-25
  • 刊出日期:  2025-02-25

目录

    /

    返回文章
    返回