The learning-based end-to-end paradigm has be-come a promising alternative to the modular pipelines in urbanautonomous driving. Recent works using reinforcement learningor imitation learning have proved the benefit of optimizing theoverall driving pipeline with fewer stacks. However, the lowsample efficiency remains the key problem for both imitationlearning and reinforcement learning. Additionally, scaling up tothe actual driving policy is still challenging for current learning-based approaches due to the complexity and diversity of theurban traffic environment. We present a framework that lever-ages a deep neural network as a feature extractor to learn thelow-dimensional representation from high-dimensional input toaccelerate the reinforcement learning process. Our experimentshows that the agent vehicle can learn an optimal policy directlyfrom visual inputs achieving significant performance comparedwith the cutting-edge methods. Moreover, our approach en-hances the sample efficiency and policy performance in diverseand sophisticated high-dimensional environments with betterrobustness.
Supplementary notes can be added here, including code, math, and images.