Neural Video Portrait Relighting in Real-time via Consistency Modeling

Longwen Zhang1,2   Qixuan Zhang1,2   Minye Wu1,3   Jingyi Yu1   Lan Xu1
1ShanghaiTech University   2Deemos Technology   3University of Chinese Academy of Sciences


Video portraits relighting is critical in user-facing human photography, especially for immersive VR/AR experience. Recent advances still fail to recover consistent relit result under dynamic illuminations from monocular RGB stream, suffering from the lack of video consistency supervision. In this paper, we propose a neural approach for real-time, high-quality and coherent video portrait relighting, which jointly models the semantic, temporal and lighting consistency using a new dynamic OLAT dataset. We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture, which combines a multi-task and adversarial training strategy for semantic-aware consistency modeling. We adopt a temporal modeling scheme via flow-based supervision to encode the conjugated temporal consistency in a cross manner. We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world. Extensive experiments demonstrate the effectiveness of our approach for consistent video portrait light-editing and relighting, even using mobile computing.


Responsive image
The training pipeline of our approach. It consists of a structure and lighting disentanglement (Sec. 4.1), a temporal consistencymodeling (Sec. 4.2) and a lighting sampling (Sec. 4.3), so as to generate consistent video relit results from a RGB stream in real-time.


Responsive image
Our relighting results under dynamic illuminations. Each triplet includes the input frame and two relit result examples.


YouTube video


We will publish the code and data for training [ DOWNLOAD HERE ] (coming soon)


Responsive image


      title={Neural Video Portrait Relighting in Real-time via Consistency Modeling}, 
      author={Longwen Zhang and Qixuan Zhang and Minye Wu and Jingyi Yu and Lan Xu},


The authors would like to thank all participants of the Light Stage recordings. We also thank the authors of Wang et. al. [2020] for providing the results of their method for comparisons.