Optimization of Deep Learning Techniques for Real-Time Detection and Tracking of Marine Species utilizing YOLOv8 and Deep SORT Algorithms
Abstract
Monitoring coastal seafloor vegetation is crucial for preserving biodiversity, supporting ecosystems, and ensuring sustainable fisheries. Various methods like remote sensing, surveys, and DNA collection provide detailed data on species and their health. However, challenges like poor water clarity and technical limitations hinder these efforts. Despite advancements in deep learning, effective species identification methods are still lacking.This thesis proposes a method using deep learning techniques to develop a system capable of accurately detecting and categorizing marine species. Due to the challenges of underwater environments such as poor lighting and muddy water, developing effective algorithms for real-time species detection is essential for informed decision-making and ecosystem management. This system consists of two primary steps: object tracking and object detection. The species detection system was trained using YOLOv8 as its foundation. For tracking in the video stream, a modified version of "Deep SORT" was employed. You Only Look Once (YOLO) is a highly efficient, single-stage object detection algorithm that uses a neural network for forward propagation. The Deep SORT algorithm is a state-of-the-art technique for object-tracking that provides multi-object tracking in real time. Thus, both algorithms are suitable for real-time applications.The topic of this thesis was suggested by Bjørn Christian Weinbach at Vestlandsforsking, as he wants to explore and optimize algorithms for object detection and segmentation as a part of his doctoral research. He also supplied the data used in my master's thesis. The data was collected by recording videos using the Blueye Pioneer underwater drone. These videos were extracted into 150,000 images, and only 17,000 images were annotated due to the time-consuming nature of the process. Image annotation is a crucial part of data preparation and is essential for many machine learning and computer vision tasks. This process involves adding detailed information to images, such as bounding boxes and labels. Tools like Superannotate and CVAT are used to simplify and streamline the annotation process.The experiments explore the impact of color, using both RGB, CMYK, and grayscale color models. They also show the effect of various image enhancement techniques like gray transformation, histogram equalization, and Retinex methods [1]. Data augmentation was used to increase dataset diversity without collecting new data. The result achieved a 95% F1-score and 98% precision of detection when using original data these results are higher than other techniques mentioned above, leading to preserving color as the best option.