Product Details
A DJI/ArduPilot drone streams video to a Jetson Nano/Xavier or directly to a server via RTSP/WebRTC. Frames are stabilized, denoised, and exposure-balanced so small targets remain visible at higher altitudes. A YOLOv12 model, fine-tuned on aerial datasets across multiple heights and viewpoints, detects small human figures; detections are fed to Deep SORT to assign IDs and produce smooth tracks despite brief occlusions or abrupt motion. Operators view the stream in a web dashboard, toggle overlays, define geofences, receive instant intrusion alerts, and export incident clips and reports.
Real-world challenges
At 50–100 m, a person can be only a few dozen pixels tall; illumination and motion vary rapidly; bandwidth is limited; and backgrounds are dynamic (trees, waves, crowds). The system therefore needs small-object sensitivity without noise explosions, low end-to-end latency for actionable alerts, and robust tracking that does not lose targets during short occlusions.

Technical approach
We fine-tune YOLOv12 on drone imagery with altitude/gimbal diversity and apply small-object augmentation (mosaic/tiling, copy-paste humans, motion blur). Preprocessing adds frame stabilization, sensor denoise, and adaptive exposure. Deep SORT maintains identities across frames, while a geofencing logic layer intersects tracks with restricted polygons and filters weak detections before alerting. The design supports edge inference on Jetson (sending only metadata) or server inference for higher FPS when bandwidth is ample.

Architecture & deployment
The stack includes Python, PyTorch, OpenCV, YOLOv12, Deep SORT, streaming via WebRTC/RTSP, and Drone SDKs (DJI, ArduPilot). At the edge, Jetson ingests the camera stream, runs inference, and forwards results via WebSocket/HTTP; centrally, an analysis service fuses tracks, persists to PostgreSQL/Redis, triggers alerts, and renders the dashboard. Security defaults to TLS 1.3, JWT-based access control, role permissions, and 24/7 audit logs. The system scales horizontally by adding video workers.

Performance targets
The model maintains reliable person detection at ≥80 m altitude, achieves ≥25 FPS on Jetson Xavier or ≥40 FPS on standard server GPUs, and keeps end-to-end latency <300 ms with WebRTC. Quality targets include [email protected] ≥0.6, recall ≥0.9 inside safeguarded areas, and FPR ≤0.05 after alert filtering. In constrained links, edge mode reduces bandwidth by 50–80% by sending metadata instead of full frames.

Differentiators & example
The solution excels at small-target detection at altitude, robust multi-frame tracking, behavior analysis, and geofence intrusion alerts. The dashboard aggregates multiple UAVs, records incidents, and generates audit-ready reports. Example: detecting a person entering a no-go zone in a private coastal site; the system fires an instant alert, saves a short clip with timestamp and coordinates, and dispatches it to the security team for rapid response.
