Loading [MathJax]/jax/element/mml/optable/GeneralPunctuation.js
Volume 22 Issue 3
Jun.  2022
Turn off MathJax
Article Contents
YANG Biao, YAN Guo-cheng, LIU Zhan-wen, LIU Xiao-feng. Perception of moving objects in traffic scenes based on heterogeneous graph learning[J]. Journal of Traffic and Transportation Engineering, 2022, 22(3): 238-250. doi: 10.19818/j.cnki.1671-1637.2022.03.019
Citation: YANG Biao, YAN Guo-cheng, LIU Zhan-wen, LIU Xiao-feng. Perception of moving objects in traffic scenes based on heterogeneous graph learning[J]. Journal of Traffic and Transportation Engineering, 2022, 22(3): 238-250. doi: 10.19818/j.cnki.1671-1637.2022.03.019

Perception of moving objects in traffic scenes based on heterogeneous graph learning

doi: 10.19818/j.cnki.1671-1637.2022.03.019
Funds:

National Key Research and Development Program of China 2018AAA0100800

National Natural Science Foundation of China 52172302

China Postdoctoral Science Foundation 2021M701042

Jiangsu Postdoctoral Science Foundation 2021K187B

Changzhou Science and Technology Project CJ20200083

Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX21_2831

Jiangsu Provincial Scicence and Technology Planning Project BK20221380

More Information
  • Author Bio:

    YANG Biao(1987-), male, associate professor, PhD, yb6864171@cczu.edu.cn

    LIU Zhan-wen(1983-), female, professor, PhD, zwliu@chd.edu.cn

  • Received Date: 2021-12-26
  • Publish Date: 2022-06-25
  • In order to improve the operation efficiency and transportation safety of unmanned vehicles in traffic scenes, the perception of moving objects in traffic scenes was investigated based on the heterogeneous graph learning.In view of the influence of complex interaction relations between moving objects on their motions in actual traffic scenes, an integrated perception framework of multi-object detection-tracking-prediction was proposed based on the heterogeneous graph learning. YOLOv5 and DeepSORT were combined to detect and track the moving objects, and the trajectories of the objects were obtained. The long short-term memory (LSTM) network was used to learn the objects' motion information from their historical trajectories, and a heterogeneous graph was introduced to learn the interaction information between the objects and improve the prediction accuracies of the trajectories of moving objects. The LSTM network was also utilized to decode the objects' motion and interaction information to obtain their future trajectories, and the method was evaluated on the public transportation datasets Argoverse, Apollo, and NuScenes to verify its effectiveness.Analysis results show that the combination of YOLOv5 and DeepSORT can realize the detection and tracking of moving objects and achieve a detection accuracy rate of 75.4% and a continuous tracking rate of 61.4% for moving objects in traffic scenes. The heterogeneous graph can effectively capture the complex interaction relations between moving objects, and the captured interaction relations can improve the accuracy of trajectory prediction. The error of the predicted average displacement of moving objects reduces by 63.0% after the interaction relations captured by the heterogeneous graph are added. As a result, it is effective to consider the interaction relations between moving objects in traffic scenes. The historical and future motion information of moving objects can be perceived by introducing the heterogeneous graph to capture the interaction relations between moving objects, so as to facilitate unmanned vehicles to better understand complex traffic scenes. 4 tabs, 9 figs, 36 refs.

     

  • FullText

    Disclaimer: The English version of this article is automatically generated by Baidu Translation and only for reference. We therefore are not responsible for its reasonableness, correctness and completeness, and will not bear any commercial and legal responsibilities for the relevant consequences arising from the English translation.

    By detecting targets around unmanned vehicles, passable areas can be planned, but missed detections may pose risks to trajectory planning. The motion target contains rich temporal information, and tracking the motion targets around the vehicle can effectively avoid missed detections and provide more information for trajectory planning. For tracking moving targets around vehicles, Wang et al[5]Using autoencoders to learn target feature extraction and classification, achieving tracking of targets around vehicles in complex environments; Tao et al[6]Using a deep Siamese network to learn the target matching function, match the target in the initial frame with the candidate samples in the frame to be tracked; Nam et al[7]Learning shared representations of objects from multiple annotated video sequences to assist networks in target tracking; Yang et al[8]It was found that using 3D convolutional neural networks to extract spatiotemporal features of moving targets can effectively perceive them; Wang et al[9]It was found that using convolutional layers at different levels to represent targets from different perspectives can effectively separate targets from distractors with similar appearances.

    Detecting and tracking moving targets around the vehicle can provide information for trajectory planning of unmanned vehicles. However, vehicles traveling at high speeds need to predict the movement trajectory of surrounding targets in order to avoid potential collisions in advance. This ensures safety and improves driving comfort. Regarding trajectory prediction issues, Lee et al[10]Propose to use recurrent neural networks for long-term trajectory prediction of targets; Gupta et al[11]By increasing adversarial training, the generated trajectories can be made more realistic; Walker and others[12]Predicting dense short-term trajectories using variational autoencoders; Cai et al[13]Proposed a method for predicting pedestrian trajectories from a first person perspective; Giuliari et al[14]Discovery that Transformer network can accurately predict pedestrian trajectories in the absence of partial trajectories; Kitani et al[15]Proposed an optimal control method to predict the future trajectory of vehicles.

    The above target detection, tracking, and trajectory prediction methods all help unmanned vehicles plan their motion trajectories from different perspectives. In recent years, researchers have proposed methods that integrate detection, tracking, and trajectory prediction functions: Luo et al[16]Propose to use multi scan LiDAR point cloud data to achieve 3D object detection and motion trajectory prediction; Liang et al[17]Using point cloud data and high-definition map information as inputs, a two-stage tracking framework is employed to track the target, and based on this, long short-term memory is used[18]Long Short Term Memory (LSTM) neural network predicts target trajectories. The above method can track and predict the trajectory of vehicles/pedestrians, but it relies on expensive LiDAR to obtain point cloud data, and there is also a problem of insufficient understanding of the interaction behavior between targets, which affects the accuracy of tracking and trajectory prediction.

    There are multiple moving targets in real traffic scenarios, and there are also more complex interactive behaviors between targets, which pose high requirements for trajectory prediction. In terms of interactive perception, Alahi et al[19]Modeling pedestrian interaction using LSTM and predicting trajectories; Ma et al[20]Using game theory concepts to simulate interactions between pedestrians; Bisagno et al[21]Propose clustering pedestrian trajectories with similar motion trends and apply an improved Social LSTM to model interactions within the crowd; Yang Biao and others[22]Propose to predict pedestrian intentions based on the fusion of pedestrian motion information and scene information, and capture human vehicle interaction information in advance. The above methods can model the interaction between targets, but there are still problems such as insufficient exploration and utilization of the influencing factors of target interaction. The influence relationship between targets is determined by various factors, such as the direction of target movement, distance, front and rear position, etc. These factors can be combined into different influence relationships. By fully utilizing these factors to establish the influence relationship between targets, the interaction between targets can be effectively modeled.

    In response to the above issues, this article proposes a Detection Tracking Prediction Perception (DTPP) algorithm based on heterogeneous graph learning for motion object detection in traffic scenarios: introducing lightweight YOLOv5[23]Real time detection of moving targets in traffic scenes using object detection networks, combined with Wojke et al[24]The proposed DeepSORT multi-target tracking network tracks targets; Using heterogeneous graph networks to encode the interactive influence between multiple targets in response to the complex influence relationships between targets in the scene; To accurately predict the future trajectory of the target, an LSTM based encoder is used to learn the target motion pattern. After integrating the target interaction influence encoding and the target motion pattern encoding, an LSTM based decoder network is used to predict the future motion trajectory of the target. The contribution of this article lies in proposing a vision based cascade framework that integrates object detection, tracking, and trajectory prediction, providing unmanned vehicles with current and future trajectory information of scene targets (pedestrians/vehicles); In response to the complex interaction behaviors of targets in the scene, multiple interaction relationships (far/near distance, same/opposite motion direction, front/back position) are defined using target position and motion state, and heterogeneous graph networks are introduced to learn the interaction effects between targets, thereby improving trajectory prediction accuracy. The target interaction information also provides the possibility for improving the accuracy of target tracking in the future. This article has been validated on public datasets Argorse, NuScenes, and Apollo, and the experimental results demonstrate the effectiveness of the algorithm.

    In recent years, researchers have proposed the use of regional convolutional neural networks to detect pedestrian/vehicle targets in traffic scenes[25]A two-stage strategy object detection method represented by Regions with Convolutional Neural Network (R-CNN), where Fast R-CNN is a fast region convolutional neural network[26]Using fully convolutional neural networks to extract object detection boxes, while reducing computational complexity through shared convolutional layers, applying neural networks to classify object detection boxes; Faster Region Convolutional Neural Network (Faster R-CNN)[27]Propose a regional recommendation network for quickly generating candidate regions, and alternate training the regional recommendation network and detection network to achieve shared parameters. The above method is based on a two-stage strategy for target detection, which has a slow inference speed and is difficult to apply in real time. With YOLO[28]A single-stage strategy object detection method, represented by, has been proposed, along with a single shot multi box detector[29]The Single Shot MultiBox Detector (SSD) uses a faster single-stage strategy to improve algorithm speed and is widely used in various real-time detection tasks. This article introduces YOLOv5 to detect pedestrians and vehicles in traffic scenes, combined with DeepSORT network for real-time tracking of targets, providing historical observation trajectories of targets.

    The structure of YOLOv5 object detection network mainly consists of three parts: feature extraction layer, Path Aggregation Network (PANet), and output layer. The feature extraction layer is responsible for extracting useful features from the image; Using Cross Stage Partial (CSP) networks to reduce computational complexity while enhancing the learning ability of convolutional neural networks; The use of Spatial Pyramid Pooling (SPP) avoids image distortion caused by cropping and scaling, and solves the problem of repeated feature extraction in the network. PANet enhances the underlying features of the image through path aggregation while performing feature segmentation. The output layer uses a one-dimensional convolutional neural network to classify features and achieve object detection.

    Simple online real-time tracking[30]The Simple Online and Realtime Tracking (SORT) algorithm estimates the target trajectory using a Kalman filter and matches the predicted and detected boxes using the Hungarian algorithm. However, the tracking performance is poor in the case of long-term occlusion. The DeepSORT algorithm uses cascaded matching on the basis of SORT to calculate the Mahalanobis distance and depth features between the current frame and the next frame of the target. The Hungarian algorithm is used to associate the target detection box and prediction box, improving the tracking effect in occlusion situations.Figure 1Track the process for DeepSORT.

    Figure  1.  DeepSORT tracking process
    Figure  2.  LSTM network cycle unit

    Figure 3The GTNs computation process. Firstly, a 1 × 1 multi-channel convolution is used to select effective adjacency matrices from those containing heterogeneous relationships. Then, multiple effective adjacency matrices are multiplied to obtain the meta path adjacency matrix. Finally, a Graph Convolutional Network (GCN) is applied to the meta path to generate new node representations.

    Figure  3.  GTNs operation process

    Figure 4The DTPP algorithm framework proposed in this article. Use YOLOv5 to detect moving targets (vehicles/pedestrians) in the scene, and combine DeepSORT for target tracking to obtain the observation trajectory of the moving target. Establish a heterogeneous graph using moving targets as nodes and input it into GTNs to encode the mutual influence of different targets. Use LSTM based Motion LSTM (M-LSTM) network to encode the target motion pattern, concatenate the result with the encoding result of GTNs, and send it to LSTM based Decoder LSTM (D-LSTM) network to decode the future trajectory of the target. The algorithm mainly consists of three modules: a lightweight network based on YOLOv5 and DeepSORT for detecting and tracking moving targets around unmanned vehicles; GTNs based on heterogeneous graph learning are used to encode the interactive influence between moving targets; The LSTM based motion pattern encoder/decoder encodes/decodes the target motion pattern from the observed trajectory and generates future trajectories.

    Figure  4.  Framework of DTPP algorithm

    This article first uses the YOLOv5 motion object detection network to detect motion objects in the scene; Then, the multi object tracking network DeepSORT is used to track the moving targets in the scene. The specific steps are as follows: first, input the target's position, aspect ratio, height, and velocity information into the Kalman filter, and estimate the target's current and next positions at the moment; Then, calculate the Mahalanobis distance of the target and the similarity of the appearance of the target in the two frames before and after; Finally, based on the calculated similarity, the target is matched and the best predicted updated target trajectory is selected.

    This article represents the moving pedestrian or vehicle targets detected by YOLOv5 asgiiNumber the target. Using DeepSORT to track and obtain targetsgiexisttThe trajectory coordinates of time are represented asPi, t= (xi, t, yi, t)Among them(xi, t, yi, t)FortTime objectiveiThe horizontal and vertical coordinates in a coordinate system. The time range of the observation trajectory ist=1, 2, …, TAmong themTTo observe the final moment. The purpose of this article is to predict the future prediction time of pedestrians or vehicles based on the observed trajectories of pedestrian and vehicle targetst=T+1, …, RThe position of time, among whichRTo predict the end time.

    ei,t=Φ(xi,t,yi,t;W1) (1)
    mi=MLSTM(hi,t1,ei,t;W2) (2)

    After encoding and stacking the motion patterns of all targets in the same scene, the motion state matrix of the scene targets can be obtainedM.

    In order to capture the mutual influence between different moving targets, this paper takes pedestrian/vehicle targets in traffic scenes as nodes of heterogeneous graphs and encodes their motion as node features. At the same time, a relationship matrix is established using the distance between the targets, their positions in front of and behind, and their movement directions being the same or opposite. The influence range is set for each relationship, thereby converting continuous values into discrete values of 0 and 1. Establish a binary matrix and use it as an edge in the graph to obtain a heterogeneous graph composed of multiple types of edges. Applying GTNs to fuse multiple heterogeneous relationships, encoding the interaction effects between targets, and improving trajectory prediction performance. The specific process is as follows.

    Firstly, all moving targets at timeTThe position definition distance edge matrixAn, as follows

    dist(Pi,T,Pj,T)=(xi,Txj,T)2+(yi,Tyj,T)2 (3)
    an,i,j={1dist(Pi,T,Pj,T)30 其他  (4)
    cos(vi,T,vj,T)=xi,Txj,T+yi,Tyj,T (5)

    Based on the goalTCalculate the cosine of the angle between the motion directions at each moment, and compute the same direction edge matrices of the target separatelyAsCompared to the reverse edge matrixAp, such as

    a_{\mathrm{s}, i, j}= \begin{cases}1 & 0.5 < \cos \left(\boldsymbol{v}_{i, T}, \boldsymbol{v}_{j, T}\right)<1 \\ 0 & \text { 其他 }\end{cases} (6)
    a_{\mathrm{p}, i, j}= \begin{cases}1 & -1<\cos \left(\boldsymbol{v}_{i, T}, \boldsymbol{v}_{j, T}\right)<-0.5 \\ 0 & \text { 其他 }\end{cases} (7)

    In the formula:as, i, jFor the same direction edge matrixAsauxiliary word for ordinal numbersigojColumn elements;ap, i, jFor the inverse edge matrixApauxiliary word for ordinal numbersigojThe elements of the column.

    \boldsymbol{r}_{i, j}=\boldsymbol{P}_{i, T}-\boldsymbol{P}_{j, T} (8)
    a_{\mathrm{q}, i, j}= \begin{cases}1 & 0.5 <\cos \left(\boldsymbol{r}_{i, j}, \boldsymbol{v}_{i, T}\right)<1 \\ 0 & \text { 其他 }\end{cases} (9)
    a_{\mathrm{b}, i, j}= \begin{cases}1 & -1<\cos \left(\boldsymbol{r}_{i, j}, \boldsymbol{v}_{i, T}\right)<-0.5 \\ 0 & \text { 其他 }\end{cases} (10)

    In the formula:aq, i, jFor the front position matrixAqauxiliary word for ordinal numbersigojColumn elements;ab, i, jFor the posterior position matrixAbauxiliary word for ordinal numbersigojThe elements of the column.

    Concatenate 5 relationship matrices to obtain the heterogeneous graph edge matrixHdo

    \boldsymbol{H}=\operatorname{Concat}\left(\boldsymbol{A}_{\mathrm{n}}, \boldsymbol{A}_{\mathrm{s}}, \boldsymbol{A}_{\mathrm{p}}, \boldsymbol{A}_{\mathrm{q}}, \boldsymbol{A}_{\mathrm{b}}\right) (11)
    \begin{aligned} &\boldsymbol{Q}_{k}=\sum\limits_{k \in C} \boldsymbol{\alpha}_{k} \boldsymbol{H}_{k} \end{aligned} (12)
    \boldsymbol{A}=\boldsymbol{Q}_{1} \boldsymbol{Q}_{2} \cdots \boldsymbol{Q}_{k} (13)
    \boldsymbol{Z}=\sigma[\boldsymbol{D}(\boldsymbol{A})(\boldsymbol{A}+\boldsymbol{I}) \boldsymbol{M} \boldsymbol{W}] (14)

    In the formula:D(·) is the calculation of the degree matrix;IFor the identity matrix;WTraining weights for GCN network;σ(·) is the Softmax activation function.

    \boldsymbol{f}_{i}=\text { Concat }\left(\boldsymbol{m}_{i}, \boldsymbol{z}_{i}\right) (15)
    \boldsymbol{d}_{i, t+1}=\mathrm{D}-\operatorname{LSTM}\left(\boldsymbol{d}_{i, t}, \boldsymbol{f}_{i} ; \boldsymbol{W}_{3}\right) (16)
    \left(\tilde{x}_{i, t+1}, \tilde{y}_{i, t+1}\right)=\delta\left(\boldsymbol{d}_{i, t+1}\right) (17)
    L=\sum\limits_{i=1}^{N}\left\|\boldsymbol{Y}_{i}-\tilde{\boldsymbol{Y}}_{i}\right\|_{2} (18)

    In the formula:LFor loss value;NFor all target numbers;YiFor the goaliThe true trajectory;\tilde{\boldsymbol{Y}}_{i} Predict the trajectory for the algorithm.

    Mean displacement errorECalculate the L2 distance between the true trajectory and the predicted trajectory at each time step within all predicted times, using the following formula:

    E=\frac{\sum\limits_{i \in O} \sum\limits_{R_{t=T+1}} \sqrt{\left[\left(\tilde{x}_{i, t}, \tilde{y}_{i, t}\right)-\left(x_{i, t}, y_{i, t}\right)\right]^{2}}}{N R} (19)

    Final displacement errorFCalculate the L2 distance between the actual trajectory and the predicted trajectory at the last time step using the following formula:

    F=\frac{\sum\limits_{i \in O} \sqrt{\left[\left(\tilde{x}_{i, R}, \tilde{y}_{i, R}\right)-\left(x_{i, R}, y_{i, R}\right)\right]^{2}}}{N} (20)
    Figure  5.  Comparison of distant target detection effects
    Figure  6.  Comparison of detection effects when light is confused
    Figure  7.  Multi-object tracking effects
    Figure  8.  Visualization of trajectory prediction on Argoverse database
    Figure  9.  Visualization of trajectory prediction on Apollo database
    Table  1.  Ablation test results(Apollo/Argoverse)
    卡尔曼滤波 运动模式编码 运动目标交互编码 ADE
    14.31/2.94
    8.58/2.61
    4.26/2.49
     | Show Table
    DownLoad: CSV
    Table  2.  Comparison between Faster R-CNN and YOLOv5
    算法 MAP/% FPS/Hz FLOPs/B
    Faster R-CNN 67.0 21.7 23.0
    YOLOv5 75.4 52.8 17.0
     | Show Table
    DownLoad: CSV
    Table  3.  Comparison between SORT and DeepSORT
    算法 MOTA MOTP MT/% ML/% FPS/Hz
    SORT 59.8 79.6 25.4 22.7 60
    DeepSORT 61.4 79.1 32.8 18.2 40
     | Show Table
    DownLoad: CSV
    Table  4.  Performance comparison among different trajectory prediction methods on three public databases (ADE/FDE)
    算法 Argoverse Apollo NuScenes
    CVM 3.19/7.09 25.01/55.40 4.77/6.87
    LSTM 2.61/5.52 8.58/16.04 2.62/4.33
    SGAN 2.53/7.85 16.41/29.79 1.77/3.87
    NMMP 2.57/7.43 12.90/25.28 2.21/3.89
    Social-STGCNN 2.54/6.78 4.60/5.68 1.32/2.43
    DTPP 2.49/5.35 4.26/7.58 1.25/2.21
     | Show Table
    DownLoad: CSV
    Figure  1.  DeepSORT tracking process
    Figure  2.  Cycle unit of LSTM
    Figure  3.  Operation process of GTNs
    Figure  4.  Framework of DTPP algorithm

    YOLOv5 is used to detect moving objects in traffic scenes. Then, DeepSORT is introduced to track all moving objects in the scene. Specifically, objects’ positions, aspect ratios, heights, and speed information are input into the Kalman filter to estimate the current and following positions. Afterward, the objects’ Mahalanobis distance and appearance similarities of objects in two consecutive frames are calculated. Finally, objects are matched according to the calculated appearance similarities, and the best prediction is selected to update the tracking process.

    In this paper, the moving pedestrians or vehicles detected by YOLOv5 are represented as gi, where idenotes the object number. DeepSORT is used for tracking, and the trajectory of giat moment tis represented as Pi, t=(xi, t, yi, t), where (xi, t, yi, t) denotes the horizontal and vertical coordinates of giat t. The time of the observed trajectory tequals 1, 2, 3, …, T, where Tis the last observation moment. This study aims to predict the future positions of the pedestrians or vehicles at t= T+ 1, …, Rbased on their observed trajectories, where Ris the last prediction moment.

    The motion of pedestrians and vehicles usually follows specific patterns. Specifically, pedestrians tend to move towards the object at a certain speed, and vehicles run steadily or accelerate speed. This study uses an LSTM-based motion pattern encoder M-LSTM to encode objects’ motion patterns. First, the coordinates of the observed trajectory of the moving object iat tare encoded into a fixed-length vector ei, t. Afterward, ei, tis input into M-LSTM to generate the motion pattern encoding mias follows:

    (1)

    (2)

    The object motion state matrix Mis generated by stacking all objects’ motion pattern encoding in the same scene.

    To capture the mutual influence between different moving objects, this study takes pedestrians and vehicles in traffic scenes as the nodes of the heterogeneous graph and their motion encoding as the node feature. Meanwhile, the relationship matrix is established by using the distance (far/near), position (front/back), and motion direction (homodromous/anastrophic) by setting the range of influence for each relationship, the study converts the continuous values into discrete values of zero and one, and a heterogeneous graph composed of multi-type edges is obtained by building a binary matrix and using the matrix as the edge in the graph. GTNs are introduced to fuse multiple heterogeneous relationships to encode the interaction influence between different objects and improve trajectory prediction accuracy. Details are as follows:

    First, the distance edge matrix Anis defined based on all objects’ positions at the moment T, which is expressed as follows

    (3)

    (4)

    where dist(·) is a distance function. (xi, T, yi, T) and (xj, T, yj, T) are positions of objects iand jat T, respectively. Pi, Tand Pj, Tare trajectory coordinates of iand jat T, respectively. an, i, jis the element in the i-th row and j-th column of An.

    Distance between iand jis less than or equals the interaction distance of three when an, i, jequals one. In this case, the two objects should consider each other’s motion state to keep a safe distance and avoid collisions.

    The kinematic velocities of iand jare defined as vi, Tand vj, T, which are calculated based on their observed trajectories at T. Subsequently, the included angle cosine of their motion directions is calculated as follows

    (5)

    Afterward, the homodromous and anastrophic edge matrix Asand Apof the objects are calculated based on the included angle cosine of their motion directions at T, and the formula is as follows

    (6)

    (7)

    Specifically, iand jmove in the same direction when as, i, jequals one, and they move in the opposite direction when ap, i, jequals one.

    Then, the relative position vectors ri, jof iand jat Tare calculated as follows

    (8)

    As a result, the pre-position matrix Aqand posterior position matrix Abare defined as follows

    (9)

    (10)

    where aq, i, jand ab, i, jare elements in the i-th row and j-th column of Aqand Ab, respectively.

    The heterogeneous graph edge matrix His generated by concatenating the five relation matrices as follows

    (11)

    where Concat (·) denotes the concatenation operation.

    GTNs are used to learn the node representation based on the established heterogeneous graph. Initially, the graph transformer (GT) layer is used to generate meta-paths from H, which can be expressed as follows

    (12)

    (13)

    where Cdenotes all types of relationships. αkis the selection weight matrix of the k-th relationship. Hkis the heterogeneous graph edge matrix of the k-th relationship. Qkis the transition matrix. Adenotes the meta-path adjacency matrix generated by multiplying all transition matrices, and it contains valuable information for trajectory prediction after multiple heterogeneous relationships are fused.

    After obtaining the meta-path adjacency matrix by using the GT layer, GCN is applied to the meta-path adjacency matrix to obtain the node representation Z, which can be expressed as follows

    (14)

    where D(·) is the calculation of the degree matrix; Iis the unit matrix; Wis the training weight of GCN; σ (·) is the Softmax activation function.

    The i-th row of Zrepresents the interaction encoding between iand its surrounding objects and is denoted as zi.

    To comprehensively consider the motion characteristics of moving objects and their influence on the motion of surrounding objects, the motion pattern encoding miand interaction coding ziare concatenated as follows

    (15)

    where fiis the concatenated vector that contains the motion pattern of iand the influence of surrounding objects.

    Finally, an LSTM-based decoder, D-LSTM, is used to predict the moving object’s future trajectory. fiis used as the initial state of the D-LSTM to predict future trajectories. The following formula can be obtained.

    (16)

    (17)

    where D-LSTM (·) is the motion pattern decoder, and W3is its learnable weight of the decoder; di, tdenotes the hidden variable encoding of the decoder for iat T; δ(·) is the linear layer function. (, ) represents the predicted position of iat T+ 1.

    The proposed algorithm can generate a predicted trajectory for each moving object. The L2 distance between the predicted and real trajectories is used as the loss function in this study, which is defined as follows

    (18)

    (19)

    (20)
    Figure  5.  Detection effect comparison of distant objects
    Figure  6.  Detection effect comparison under complex light
    Figure  7.  Multi-object tracking effects
    Figure  8.  Visual trajectory prediction on Argoverse dataset
    Figure  9.  Visual trajectory prediction on Apollo database
    Table  1.  Ablation experiment results (Apollo/Argoverse)
     | Show Table
    DownLoad: CSV
    Table  2.  Comparison between faster R-CNN and YOLOv5
     | Show Table
    DownLoad: CSV
    Table  3.  Comparison between SORT and DeepSORT
     | Show Table
    DownLoad: CSV
    Table  4.  Performance comparison among different trajectory prediction algorithms on three public databases (ADE/FDE)
     | Show Table
    DownLoad: CSV
  • [1]
    CHEN Xiao-zhi, KUNDU K, ZHANG Z, et al. Monocular 3D object detection for autonomous driving[C]//IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 2147-2156.
    [2]
    XU Bin, CHEN Zhen-zhong. Multi-level fusion based 3D object detection from monocular images[C]//IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2345-2353.
    [3]
    WANG Yan, CHAO Wei-lun, GARG D, et al. Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]//IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8445-8453.
    [4]
    LIU Ze, CAI Ying-feng, WANG Hai, et al. Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6640-6653.
    [5]
    WANG Nai-yan, YEUNG D Y. Learning a deep compact image representation for visual tracking[J]. Advances in Neural Information Processing Systems, 2013, 26(1): 809-817.
    [6]
    TAO Ran, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]//IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 1420-1429.
    [7]
    NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 4293-4302.
    [8]
    YANG Biao, ZHAN Wei-qin, WANG Pin, et al. Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 5338-5349.
    [9]
    WANG Li-jun, OUYANG Wan-li, WANG Xiao-gang, et al. Visual tracking with fully convolutional networks[C]//IEEE. 15th International Conference on Computer Vision. New York: IEEE, 2015: 3119-3127.
    [10]
    LEE N, CHOI W, VERNAZA P, et al. Desire: distant future prediction in dynamic scenes with interacting agents[C]//IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 336-345.
    [11]
    GUPTA A, JOHNSON J, LI Fei-fei, et al. Social GAN: socially acceptable trajectories with generative adversarial networks[C]//IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 2255-2264.
    [12]
    WALKER J, DOERSCH C, GUPTA A, et al. An uncertain future: forecasting from static images using variational autoencoders[C]//Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 835-851.
    [13]
    CAI Ying-feng, DAI Lei, WANG Hai, et al. Pedestrian motion trajectory prediction in intelligent driving from far shot first-person perspective video[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5298-5313.
    [14]
    GIULIARI F, HASAN I, CRISTANI M, et al. Transformer networks for trajectory forecasting[C]//IEEE. 25th International Conference on Pattern Recognition. New York: IEEE, 2021: 10335-10342.
    [15]
    KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity forecasting[C]//Springer. 10th European Conference on Computer Vision. Berlin: Springer, 2012: 201-214.
    [16]
    LUO Wen-jie, YANG Bin, URTASUN R. Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net[C]//IEEE. 31th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 3569-3577.
    [17]
    LIANG Ming, YANG Bin, ZENG Wen-yuan, et al. PnPNet: end-to-end perception and prediction with tracking in the loop[C]//IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11553-11562.
    [18]
    SHI Xing-jian, CHEN Zhou-rong, WANG Hao, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 802-810.
    [19]
    ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces[C]//IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 961-971.
    [20]
    MA W C, HUANG D A, LEE N, et al. Forecasting interactive dynamics of pedestrians with fictitious play[C]//IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 774-782.
    [21]
    BISAGNO N, ZHANG B, CONCI N. Group LSTM: group trajectory prediction in crowded scenarios[C]//Springer. 16th European Conference on Computer Vision. Berlin: Springer, 2018: 213-225.
    [22]
    YANG Biao, FAN Fu-cheng, YANG Ji-cheng, et al. Recognition of pedestrians' street-crossing intentions based on action prediction and environment context[J]. Automotive Engineering, 2021, 43(7): 1066-1076. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202107015.htm
    [23]
    REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//IEEE. 30th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 7263-7271.
    [24]
    WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//IEEE. 24th IEEE International Conference on Image Processing (ICIP). New York: IEEE, 2017: 3645-3649.
    [25]
    GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE. 27th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
    [26]
    GIRSHICK R. Fast R-CNN[C]//IEEE. 19th International Conference on Pattern Recognition. New York: IEEE, 2015: 1440-1448.
    [27]
    REN Shao-qing, HE Kai-ming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28(1): 91-99.
    [28]
    REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//IEEE. 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 779-788.
    [29]
    LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Springer. 14th European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
    [30]
    BEWLEY A, GE Zong-yuan, OTT L, et al. Simple online and realtime tracking[C]//IEEE. 23th IEEE International Conference on Image Processing (ICIP). New York: IEEE, 2016: 3464-3468.
    [31]
    YUN S, JEONG M, KIM R, et al. Graph transformer networks[J]. Advances in Neural Information Processing Systems, 2019, 32(1): 11983-11993.
    [32]
    CHANG Ming-fang, LAMBERT J, SANGKLOY P, et al. Argoverse: 3D tracking and forecasting with rich maps[C]//IEEE. 32th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 8748-8757.
    [33]
    MA Yue-xin, ZHU X, ZHANG Si-bo, et al. Traffic predict: trajectory prediction for heterogeneous traffic-agents[C]//AAAI. 33th AAAI Conference on Artificial Intelligence. Menlo Park : AAAI, 2019: 6120-6127.
    [34]
    CAESAR H, BANKITI V, LANG A H, et al. NuScenes: a multimodal dataset for autonomous driving[C]//IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 11621-11631.
    [35]
    HU Yue, CHEN Si-heng, ZHANG Ya, et al. Collaborative motion prediction via neural motion message passing[C]//IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 6319-6328.
    [36]
    MOHAMED A, QIAN KU N, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction[C]//IEEE. 33th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 14424-14432.
  • Relative Articles

    [1]CHEN Tao, LI Xiao-yang, CHEN Bin. Decoupling effect and peak prediction of carbon emission in transportation industry under dual-carbon target[J]. Journal of Traffic and Transportation Engineering, 2024, 24(4): 104-116. doi: 10.19818/j.cnki.1671-1637.2024.04.008
    [2]CUI Jian-xun, YAO Jia, ZHAO Bo-yuan. Review on short-term traffic flow prediction methods based on deep learning[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 50-64. doi: 10.19818/j.cnki.1671-1637.2024.02.003
    [3]CHEN Kun, LI Fang, FENG Zhen-yu, CHEN Xiang-ming, DUAN Long-kun. Evacuation trajectory prediction of passengers in transport aircraft based on social-implicit model[J]. Journal of Traffic and Transportation Engineering, 2024, 24(5): 270-284. doi: 10.19818/j.cnki.1671-1637.2024.05.018
    [4]ZHANG Miao, HE Yi-juan, YANG Bo-yu, LUO Zheng-wei, LU Wan-li, TANG Tao, LI Kai-cheng, LYU Ji-dong. Trajectory prediction method for high-speed train group tracking operation based on LSTM-KF model[J]. Journal of Traffic and Transportation Engineering, 2024, 24(3): 296-310. doi: 10.19818/j.cnki.1671-1637.2024.03.021
    [5]JIANG Ling-li, LI Shu-hui, LI Xue-jun, WANG Guang-bin, GAO Lian-bin. Health assessment method of traction motor bearing based on transfer learning and convolutional neural network[J]. Journal of Traffic and Transportation Engineering, 2023, 23(3): 162-172. doi: 10.19818/j.cnki.1671-1637.2023.03.012
    [6]HU Zuo-an, DENG Jin-cheng, HAN Jin-li, YUAN Kai. Review on application of graph neural network in traffic prediction[J]. Journal of Traffic and Transportation Engineering, 2023, 23(5): 39-61. doi: 10.19818/j.cnki.1671-1637.2023.05.003
    [7]DING Jian-ming, ZHOU Jing-yao, JIANG Hai-fan. In-vehicle image technology for identifying faults of pantograph[J]. Journal of Traffic and Transportation Engineering, 2023, 23(3): 173-187. doi: 10.19818/j.cnki.1671-1637.2023.03.013
    [8]HUANG He, LI Wen-long, YANG Lan, WANG Hui-feng, RU Feng, GAO Tao. Vehicle long-term target tracker optimized by improved carnivorous plant algorithm[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 283-300. doi: 10.19818/j.cnki.1671-1637.2023.06.019
    [9]CHEN Ying, WANG Xiao-hui, ZHANG Xiao-bo, CHEN Hai-yuan, XU Jin, DU Zhi-gang. Lane offset behavior and free driving trajectory model of hairpin curves of mountain roads[J]. Journal of Traffic and Transportation Engineering, 2022, 22(4): 382-395. doi: 10.19818/j.cnki.1671-1637.2022.04.029
    [10]CHEN Ting, YAO Da-chun, GAO Tao, QIU Hui-hui, GUO Chang-xin, LIU Zhan-wen, LI Yong-hui, BIAN Hao-yi. A fused network based on PReNet and YOLOv4 for traffic object detection in rainy environment[J]. Journal of Traffic and Transportation Engineering, 2022, 22(3): 225-237. doi: 10.19818/j.cnki.1671-1637.2022.03.018
    [11]ZHAO Shu-en, WANG Jin-xiang, LI Yu-ling. Lane changing trajectory planning of intelligent vehicle based on multiple objective optimization[J]. Journal of Traffic and Transportation Engineering, 2021, 21(2): 232-242. doi: 10.19818/j.cnki.1671-1637.2021.02.020
    [12]MA Yong-jie, CHENG Shi-sheng, MA Yun-ting, MA Yi-de. Review of convolutional neural network and its application in intelligent transportation system[J]. Journal of Traffic and Transportation Engineering, 2021, 21(4): 48-71. doi: 10.19818/j.cnki.1671-1637.2021.04.003
    [13]CHEN Hua-wei, SHAO Yi-ming, AO Gu-chang, ZHANG Hui-ling. Speed prediction by online map-based GCN-LSTM neural network[J]. Journal of Traffic and Transportation Engineering, 2021, 21(4): 183-196. doi: 10.19818/j.cnki.1671-1637.2021.04.014
    [14]WANG Zheng-hong, YANG Chuan. Improved SSD model in extraction application of expressway toll station locations from GaoFen 2 remote sensing image[J]. Journal of Traffic and Transportation Engineering, 2021, 21(2): 278-286. doi: 10.19818/j.cnki.1671-1637.2021.02.024
    [15]LIU Zhan-wen, FAN Song-hua, QI Ming-yuan, DONG Ming, WANG Pin, ZHAO Xiang-mo. Multi-task perception algorithm of autonomous driving based on temporal fusion[J]. Journal of Traffic and Transportation Engineering, 2021, 21(4): 223-234. doi: 10.19818/j.cnki.1671-1637.2021.04.017
    [16]LIU Zhan-wen, ZHAO Xiang-mo, LI Qiang, SHEN Chao, WANG Jiao-jiao. Traffic sign recognition method based on graphical model and convolutional neural network[J]. Journal of Traffic and Transportation Engineering, 2016, 16(5): 122-131. doi: 10.19818/j.cnki.1671-1637.2016.05.014
    [17]SUN Shou-qun, LIU Kang-ya, LIU Shuo-yan, LU: Xiao-jun, ZHAN Xuan. Moving target detection in complex environment of railway station[J]. Journal of Traffic and Transportation Engineering, 2013, 13(3): 113-120. doi: 10.19818/j.cnki.1671-1637.2013.03.016
    [18]LIU Zhao-hui. Multifactor prediction model for traffic accident based on grey-radial basis function neural network[J]. Journal of Traffic and Transportation Engineering, 2009, 9(5): 94-98. doi: 10.19818/j.cnki.1671-1637.2009.05.017
    [19]HE Si-hua, YANG Shao-qing, SHI Ai-guo, LI Tian-wei. Ship target detection algorithm on sea surface based on block chaos feature of image sequence[J]. Journal of Traffic and Transportation Engineering, 2009, 9(1): 73-76. doi: 10.19818/j.cnki.1671-1637.2009.01.015
    [20]LIANG Xin-rong, LIU Zhi-yong, MAO Zong-yuan. Elman neural network model of freeway dynamic traffic flow[J]. Journal of Traffic and Transportation Engineering, 2006, 6(3): 92-96.
  • Cited by

    Periodical cited type(1)

    1. 黄鹤,李文龙,杨澜,王会峰,茹锋,高涛. 改进食肉植物算法优化的车辆长期目标跟踪器. 交通运输工程学报. 2023(06): 283-300 . 本站查看

    Other cited types(1)

Catalog

    Figures(18)  / Tables(8)

    Article Metrics

    Article views (646) PDF downloads(83) Cited by(2)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return