Video portraits relighting is critical in user-facing human photography, especially for immersive
Recent advances still fail to recover consistent relit result under dynamic illuminations from
monocular RGB stream, suffering from the lack of video consistency supervision.
In this paper, we propose a neural approach for real-time, high-quality and coherent video portrait
relighting, which jointly models the semantic, temporal and lighting consistency using a new dynamic
We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture, which
combines a multi-task and adversarial training strategy for semantic-aware consistency modeling.
We adopt a temporal modeling scheme via flow-based supervision to encode the conjugated temporal
consistency in a cross manner.
We also propose a lighting sampling strategy to model the illumination consistency and mutation for
natural portrait light manipulation in real-world.
Extensive experiments demonstrate the effectiveness of our approach for consistent video portrait
light-editing and relighting, even using mobile computing.