Abstract: In agricultural mechanized production, in order to ensure the efficiency of hand-eye cooperative operation of tomato picking robot, the recognition accuracy and speed of multi-growth period tomato fruit is an important basis. Therefore, in order to improve the recognition speed of multi-growth period tomato fruit while ensuring or improving the accuracy, this paper improves the Yolov5s model by adding the architecture of the lightweight mobilenetv3 model. Firstly, the deep separable convolution is replaced in the backbone network of Yolov5s, which reduces the amount of convolution operation. Secondly, the linear bottleneck inverse residual structure is fused to obtain more features in high-dimensional space and perform convolution operation in low-dimensional space. Third, the attention mechanism is inserted into the last layer of the network to highlight features and improve accuracy. The research results show that the recognition accuracy of the improved Yolov5 model remains above 98%, the CPU recognition speed is 0.88f·s-1 faster than Yolov5s, and the GPU recognition speed is 90 frames per second faster than Yolov5s. Finally, a set of the recognition software system of multi-growth period tomato fruit is designed and developed by using RealSense D435i depth camera and PYQT. The software system further verifies the feasibility of the improved Yolov5 model, and lays a foundation for the visual software design of agricultural picking robot picking recognition.