Overview
Welcome to the 1st Workshop on AI for Streaming at CVPR!
This workshop focuses on unifying new streaming technologies, computer graphics, and computer vision, from the modern deep learning point of view. Streaming is a huge industry where hundreds of millions of users demand everyday high-quality content on different platforms.
Computer vision and deep learning have emerged as revolutionary forces for rendering content, image and video compression, enhancement, and quality assessment.
From neural codecs for efficient compression to deep learning-based video enhancement and quality assessment, these advanced techniques are setting new standards for streaming quality and efficiency.
Moreover, novel neural representations also pose new challenges and opportunities in rendering streamable content, and allowing to redefine computer graphics pipelines and visual content.
Awards Certificates ||
(All) Paper Proceedings ||
Photo Gallery & Slides
Call for Papers (Closed)
We welcome papers addressing topics related to VR, streaming, efficient image/video (pre- & post-)processing and neural compression. The topics include:- Efficient Deep Learning
- Model optimization and Quantization
- Image/video quality assessment
- Image/video super-resolution and enhancement
- Compressed Input Enhancement
- Generative Models (Image & Video)
- Neural Codecs
- Real-time Rendering
- Neural Compression
- Video pre/post processing
Challenges 🚀
We are happy to host the following grand challenges focused on realistic image/video applications.
Register now in the challenges to receive news by email on updates and new challenges.
The workshop challenges prizes pool will be +10.000$ 🚀 & cool stuff like PS5s
- Real-time Compressed Image Super-Resolution (Finished) A single neural network upscales compressed images (AVIF) to 4K considering different compression factors.
- UGC Video Quality Assessment (Finished) Estimate the quality of user-generated content (UGC) videos using efficient neural networks (24-30 FPS).
- Event-based Eye Tracking (Finished)
- Mobile Real-time Video Super-Resolution (Ongoing) Upscale videos compressed with AV1 in real-time on mobile devices such as iPhone 14. Fom 360p to 1080p.
- Efficient Video Super-Resolution (Ongoing) Upscale videos compressed with AV1 in real-time at 30-60FPS on commercial GPUs. From 540p to 4K.
- Depth Upsampling and Refinement (Ongoing) (From 22nd March - July) Given a low-resolution depth map, and a high-resolution RGB, upscale and refine the depth map. Top teams will be invited to present their solutions and poster at the workshop.
(From Feb to July). Top teams will be invited to present their solutions and poster at the workshop. We will showcase the best models.
(From Feb to July). Top teams will be invited to present their solutions and poster at the workshop.
The top ranked participants will be awarded and invited to present their solution at the AIS workshop at CVPR 2024.
The challenge reports (if applicable) will be published at AIS 2024 workshop, and in the CVPR 2024 Workshops proceedings.
The participants can submit papers describing their solution to the challenges and/or related problems (more info below).
We also invite you to check the challenges at the New Trends in Image Restoration and Enhancement (NTIRE) workshop .
Keynote Speaker
Professor Alan Bovik (HonFRPS) holds the Cockrell Family Endowed Regents Chair in Engineering in the Chandra Family Department of Electrical and Computer Engineering in the Cockrell School of Engineering at The University of Texas at Austin, where he is Director of the Laboratory for Image and Video Engineering (LIVE). He is a faculty member in the Department of Electrical and Computer Engineering, the Wireless Networking and Communication Group (WNCG), and the Institute for Neuroscience. His research interests include digital television, digital photography, visual perception, social media, and image and video processing.
Invited Speakers
Schedule Details - Arch 3A, 17th June
Please click in the title of each presentation to see the abstract. Local time (PDT).Awards Certificates || (All) Paper Proceedings || Photo Gallery & Slides
- 09:00 - 09:15: Opening
- 09:15 - 10:00: "Event-based Eye-Tracking Challenge & Papers Session" , Qinyu Chen (Leiden University)
"Retina : Low-Power Eye Tracking with Event Camera and Spiking Hardware", Pietro Bonazzi (ETH Zurich)
"A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception", Asude Aydin
"A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera", Yan Ru Pei
"Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN", Baoheng Zhang
-
10:00 - 10:45: "AI and Machine Learning for Video Compression" by Ryan Lei (Video Codec Specialist, Meta)
Over the past 30 years, significant advances have been achieved in the video compression domain, which has made it become an absolutely indispensable technology that powers today’s Internet. Meanwhile, the past decade has also witnessed the great success of machine learning in many areas, especially in computer vision and image processing. It is very natural to ask if and how machine learning techniques can be leveraged to improve video compression. In this talk, the speaker will first provide a high level overview of progress in the conventional video coding standard development and the challenges that the community is facing. Then the speaker will focus on a few trends on how machine learning can be leveraged for video compression. The first topic is how machine learning is used to further optimize coding tools of traditional video coding, such as intra/inter prediction, loop filtering, etc. The second topic is how machine learning is used to improve the coding efficiency of the overall compression system, such as pre/post filtering, super resolution, layered coding, etc. The last topic is how neural networks are used to develop end-to-end learned video coding frameworks and fully replace the conventional video coding system. Over the talk, the speaker will also present examples of techniques that have worked and techniques that may not work in the near future.
- 10:45 - 11:15: "Video Quality Assessment Challenge and Results"
"COVER: A Comprehensive Video Quality Evaluator", Zhengzhong Tu (Texas A&M University) - 11:15 - 11:40: Paper Presentation Session #1
"Deep Video Codec Control for Vision Models", Christoph Reich (Technical University of Munich, TU Darmstadt, NEC Laboratories America, Inc., and University of Oxford)
"A Perspective on Deep Vision Performance with Standard Image and Video Codecs", Christoph Reich (Technical University of Munich, TU Darmstadt, NEC Laboratories America, Inc., and University of Oxford)
"Adaptive render-visual (REVI) streaming for virtual environments", Matthias Treder (Sony PlayStation) -
11:40 - 12:30: "Neural Compression & Realism" by Lucas Theis (Google DeepMind)
Neural compression has the potential to transform the streaming industry by enabling highly realistic outputs at very low rates. Although these techniques have yet to break into mainstream codecs, recent advances in ML and information theory continue to bring this goal closer into view. This talk will focus on how generative AI and novel coding methods allow us to generate realistic outputs at arbitrarily low rates. In particular, we will explore two approaches based on diffusion generative models. Time permitting, we will also look at recent advances in making neural compression much more computationally efficient.
- 12:30 - 13:45: Lunch & Poster Session
-
13:45 - 14:35: "On the Visual Quality of Pictures, Games, and GenAI", by Prof. Alan Bovik (The University of Texas)
Predicting the perceptual quality of pictures and videos is a hard problem that has been successfully addressed in many scenarios, such as quality control of streaming videos and sharing of social media pictures. In this talk I will address how visual quality perception can be understood using principles of neuroscience and neuro-statistical models of distortion. In particular I will review some basic vision science that makes accurate perceptual visual quality prediction possible, and how algorithms are designed that are now used worldwide. However, generated content like gaming videos and GenAI pictures are fundamentally different from photographs, and those differences may be statistically testable. I will also discuss a recent approach to gaming content quality prediction called GAMIVAL that combines neuro-statistical models adapted to gaming distortions with deep features that capture gaming semantics. I will also discuss a very early exploration into the underlying perceptual statistics of GenAI pictures, and how these might be used to gauge quality and detectibility.
- 14:40 - 15:00: On the Edge Real-time Image Super-Resolution Challenge and News!!
-
15:00 - 15:40: "Video Super-Resolution Using Deep Learning" by Kelvin C.K. Chan (Google DeepMind)
The demand for high-resolution content continues to grow, necessitating advanced video super-resolution (VSR) techniques. However, existing VSR methods often struggle with efficiency and generalization to real-world scenes. This talk explores the benefits of leveraging temporal information in VSR, highlighting how recurrent networks can effectively propagate and align information to improve performance in both synthetic and real-world settings. Additionally, we will delve into the potential of generative models, such as diffusion models, for VSR, motivated by their recent successes in image super-resolution.
- 15:40 - 16:10: VideoGigaGAN: "Towards Detail-rich Video Super-Resolution", Yiran Xu (University of Maryland)
- 16:10 - 16:40: FMA-Net: "Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring", Jihyong Oh (Chung-Ang University (CAU))
-
16:45 - 17:30: "Deploying neural networks for video downscaling: why and how" by Christos Bampis (Netflix)
Video downscaling is an important component of video processing. Within adaptive streaming, it tailors streaming to different device resolutions and optimizes picture quality under varying network conditions. Video downscaling is typically done by a conventional resampling filter like Lanczos. We have deployed neural networks that perform video downscaling within the Netflix adaptive streaming pipeline. This talk describes why we embarked on this journey, how it was done and lessons learned. We also discuss possible pathways moving forward.
- 17:30 - 17:50: Paper Presentation Session #2
"Depth Compression and Upsampling", Jinseong Kim (RGA Inc.)
"One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing", Yueyu Hu (New York University)
"Low Latency Point Cloud Rendering with Learned Splatting", Yueyu Hu (New York University)
- 17:50 - 18:00: Closing Remarks & Award Ceremony [Challenge Diplomas]
The "Super-session" starts!
Virtual Presentations
(Click to see) Virtual presentations of challenge papers and regular papers.
The videos/slides will be available shortly.
[Link] "CASR : Efficient Cascade Network Structure with Channel Aligned method for 4K Real-Time Single Image Super-Resolution", Kihwan Yonn (The University of Seoul)
[Link] "Lanczos++: An ultra lightweight image super-resolution network", Biao Wu (Central R&D Institute, ZTE)
[Link] "FAPNet: An Effective Frequency Adaptive Point-based Eye Tracker", Xiaopeng LIN and Hongwei REN and Bojun CHENG (HKUST-GZ)
[Link] "MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking", Zhong Wang (USTC)
[Link] "Joint Motion Detection in Neural Videos Training", Niloufar Pourian (Intel Labs); Alexey Supikov (Intel Labs)
[Link] "Anchor-based Nested UnshuffleNet for Real-time Super-Resolution (ANUNet)", Menghan Zhou (Lenovo Research)
[Link] "SAFMN++: Improved Feature Modulation Network for Real-Time Compressed Image Super-Resolution", Long Sun (Nanjing University of Science and Technology)
[Link] "RVSR: Towards Real-Time Super-Resolution with Re-parameterization and ViT architecture", Zhiyuan Li (Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University)
[Link] "TVQE: Tencent Video Quality Evaluator", Wenhui Meng (Tencent)
Organizers
Program Committee
Radu Timofte (University of Würzburg)
Tim Seizinger (University of Würzburg)
Florin Vasluianu (University of Würzburg)
Zongwei Wu (University of Würzburg)
Ioannis Katsavounidis (Meta)
Ryan Lei (Meta)
Wen Li (Meta)
Cosmin Stejerean (Meta)
Shiranchal Taneja (Meta)
Zhi Li (Netflix)
Rakesh Ranjan (Meta Reality Labs)
Andy Bigos (Sony PlayStation)
Daniel Motilla (Sony PlayStation)
Saman Zadtootaghaj (Sony PlayStation)
Chang Gao (Delft University of Technology)
Qinyu Chen (University of Zurich and ETHZ & Leiden Univ)
Zuowen Wang (University of Zurich and ETHZ)
Shih-Chii Liu (University of Zurich and ETHZ)