As an emerging task, generative frame interpolation aims to synthesize a video clip from two images (i.e. the starting and ending frames) leveraging generative priors. Thanks to its "generative" nature, this task is expected to provide greater flexibility for the input frames, thus allowing for broader applications beyond temporal super-resolution. To back up this point, we develop Framer++, a diffusion-based frame interpolator that incorporates versatile control mechanisms, including texts, trajectories, and intermediate keyframes. Alongside its good visual quality, our model highlights strong adaptability to the provided images by delivering smooth and coherent transitions, which unlocks a wide range of creative applications, including morphing, smooth editing transitions, and even seamlessly connecting two images. Recognizing that traditional frame interpolation benchmarks rely solely on consecutive sequences and fall short in evaluation, we develop a new benchmark FramerBench that more accurately assesses performance. The strong performance of Framer++ underscores the viability of generative frame interpolation as a foundational tool for creativity.
Video sep 1
Video sep 2