Back to Logbook & Experiments Dashboard

[LOGBOOK] Object Detection Implementation Using YOLOv4-Tiny for Autonomous Surface Vehicles

Firizqi Aditya
13 December 2025
Object DetectionYOLOv4-TinyComputer VisionJetson NanoAutonomous Surface Vehicle
[LOGBOOK] Object Detection Implementation Using YOLOv4-Tiny for Autonomous Surface Vehicles

Object Detection Using YOLOv4-Tiny

Object detection is a core perception capability required for autonomous surface vehicles (ASVs) to safely navigate and execute competition tasks such as Evacuation Route, Debris Clearance, Emergency Response Sprint, and Navigate the Marina.

This logbook documents the implementation and evaluation of YOLOv4-Tiny as the primary object detection model deployed on the NVIDIA Jetson Nano for real-time autonomous operation.


1. Model Selection Rationale

During early development, both YOLOv4 and YOLOv4-Tiny were evaluated to determine the most suitable architecture for real-time inference on embedded hardware.

Although YOLOv4 provides higher detection accuracy (mAP), experimental evaluation shows that it suffers from:

  • Lower frame rates in complex scenes
  • Higher computational load
  • Less stable real-time performance on embedded platforms

To ensure reliable perception during autonomous missions, YOLOv4-Tiny was selected due to its significantly higher and more stable frame rate while maintaining acceptable detection accuracy.


2. Dataset Preparation and Training Configuration

The object detection model was trained using a custom dataset with the following characteristics:

  • Total images: 500 labeled images
  • Number of classes: 7
  • Dataset split:
    • 70% training
    • 20% validation
    • 10% testing

2.1 Training Resolution Selection

Based on experimental evaluation, the training input resolution was fixed at 640 × 352 pixels.

This resolution was selected because:

  • It provides a strong balance between detection accuracy and inference speed
  • It matches the aspect ratio of the onboard camera stream
  • It minimizes unnecessary image scaling during inference

2.2 Input Stream Resolution

The live camera input stream used during inference testing was also configured to 640 × 352, ensuring consistency between:

  • Training data resolution
  • Network input size
  • Real-time deployment conditions

This configuration reduces distortion and improves detection stability during continuous operation.


3. Training and Optimization Process

All models were trained using the Darknet framework with consistent hyperparameter configurations:

  • Maximum iterations: 6000 (max_batches)
  • Identical learning rate scheduling
  • Uniform data augmentation strategy

The trained YOLOv4-Tiny model was converted into TensorRT format and deployed on the NVIDIA Jetson Nano using FP16 precision, enabling accelerated inference and reduced computational load.


4. Performance Evaluation Methodology

Performance evaluation focused on two primary metrics:

  • Mean Average Precision (mAP@0.5) for detection accuracy
  • Frames Per Second (FPS) for real-time performance

Inference tests were conducted by running direct detection on a live camera stream using the Jetson Nano platform under realistic operational conditions.


5. Experimental Results Summary

The table below summarizes the observed performance of YOLOv4 and YOLOv4-Tiny across multiple input resolutions.

Table 1. Object Detection Performance on Jetson Nano (TensorRT FP16)

ModelInput ResolutionmAP@0.5FPS
YOLOv4416 × 41679.02%12 – 13
YOLOv4608 × 60884.41%5 – 6
YOLOv4640 × 35284.88%9 – 11
YOLOv4-Tiny416 × 41672.78%40 – 42
YOLOv4-Tiny608 × 60879.55%19 – 20
YOLOv4-Tiny640 × 35281.69%29 – 32
YOLOv4-Tiny960 × 54483.17%12 – 14

6. Analysis and Design Decision

Although YOLOv4 achieves higher absolute mAP values, its frame rate decreases significantly as input resolution increases, making it less suitable for real-time autonomous operation.

YOLOv4-Tiny demonstrates:

  • Faster training time
  • Significantly higher and more stable FPS
  • Competitive mAP that can be improved through hyperparameter tuning and dataset refinement

The 640 × 352 resolution provides an optimal operating point where detection accuracy remains high while maintaining real-time performance suitable for complex autonomous tasks.


7. System Integration

Detection outputs from YOLOv4-Tiny are integrated into the ROS-based perception pipeline and combined with:

  • 2D LiDAR data
  • GPS and IMU information

This sensor fusion output supports obstacle avoidance, waypoint adjustment, and autonomous decision-making.


Conclusion

The implementation of YOLOv4-Tiny at 640 × 352 resolution successfully meets the real-time perception requirements of the autonomous surface vehicle.

By aligning dataset resolution, training configuration, and live inference input size, the system achieves:

  • Stable real-time detection
  • Efficient resource utilization
  • Reliable performance in complex maritime environments

This approach represents a balanced and practical object detection solution for embedded autonomous maritime systems.


F

About the Author

Logbook & experiments documented by Firizqi Aditya. Dedicated to advancing autonomous maritime systems.