Glossary
Face Tracking Crop
Face tracking crop is a vertical-conversion technique that uses face detection to keep the active speaker centered as the camera or subject moves.
Modern face detectors — MediaPipe, RetinaFace, YOLOv8-face — run at 100+ fps on a single GPU and reliably locate faces down to 30x30 pixels. A face tracking crop pipeline runs detection per frame, picks the dominant face, and centers the 9:16 crop window on it.
The quality bar is smoothness. Each new detection box has small frame-to-frame jitter, and a raw-tracking crop produces footage that vibrates. Production pipelines apply a Kalman filter or exponential moving average over the box position to absorb that jitter while still responding to real motion within ~200ms.
AutoClip uses MediaPipe's face mesh under the hood and applies a 0.7 EMA smoothing factor, the same value Adobe Premiere's Auto Reframe uses by default. For solo creators the result is invisible — viewers don't notice the crop is following them.
Related Terms
Frequently Asked Questions
What happens when the speaker turns away from camera?
The tracker holds the last known position for ~500ms before resetting to a default center crop. Brief turn-aways don't cause drift; longer ones return to neutral framing.
Does face tracking work on animated avatars?
Partially. MediaPipe detects stylized 2D faces that retain human proportions — most VTuber rigs, anime intros — but struggles with mascot heads or non-humanoid avatars. AutoClip falls back to motion-energy tracking in those cases.
Put Face Tracking Crop to Work
AutoClip handles the full pipeline — viral moment detection, 9:16 reframing, captions, and auto-posting. Start clipping for free.
Get Started Free