Netflix Maestro and Apache Iceberg Power Netflix's Incremental Processing
Published on Netflix TechBlog, Nov 20, 501 | 16 min read
By Jun He, Yingyi Zhang, and Pawan Dixit
Netflix, a data-driven platform, introduces an efficient approach to incremental processing using Netflix Maestro and Apache Iceberg. Incremental processing involves processing new or changed data in workflows, reducing compute resource costs and execution time significantly. In this blog post, the authors delve into the challenges faced in Netflix workflows and elaborate on the Incremental Processing Solution (IPS) built with Netflix Maestro and Apache Iceberg.
Introduction
Netflix's reliance on data for various aspects of its business demands scalable low-latency incremental processing. The common challenges faced include issues related to data freshness, data accuracy, and backfill operations. Existing solutions, such as lookback windows and foreach patterns, are suboptimal and cost-inefficient.
Netflix Maestro
Maestro, Netflix's data workflow orchestration platform, serves thousands of users and offers fully managed workflow-as-a-service (WAAS). IPS is seamlessly integrated into Maestro, extending its capabilities with a new trigger mechanism and step job type for incremental processing.
Apache Iceberg
Iceberg is a high-performance format for large analytic tables, enabling engines like Spark, Trino, Flink, Presto, Hive, and Impala to work with the same tables concurrently. IPS leverages Apache Iceberg to capture incremental changes efficiently.
Incremental Change Capture Design
IPS introduces an innovative solution for incremental change capture, avoiding data duplication. It creates an Iceberg table called ICDC that references new data file changes without copying the actual data. The lightweight approach seamlessly integrates with Maestro, allowing users to adopt incremental processing effortlessly.
Main Advantages
The IPS design ensures minimal effort for users to adopt incremental processing, decoupling user business logic from IPS implementation. Users can mix incremental processing workflows with existing ones in multi-stage pipelines, simplifying workflows and achieving significant cost reduction.
Emerging Incremental Processing Patterns
While onboarding pipelines to IPS, several patterns emerge, including direct appending of change data to the target table, using change data as a filter list to remove unnecessary transformation, and incorporating captured range parameters in business logic.
Use Cases
Netflix's data workflows often deal with late-arriving data, traditionally addressed using lookback window patterns. IPS improves cost efficiency by eliminating unnecessary data reprocessing, resulting in a significant reduction in resource usage and execution time.
Looking Forward
Netflix aims to enhance IPS to support more complex scenarios beyond append-only cases, tracking table changes' progress, and supporting multiple Iceberg table change types. Managed backfill support will also be added to IPS, further improving user experience.
Acknowledgements
The authors extend their gratitude to various contributors and leaders at Netflix for their valuable feedback and suggestions during the development of IPS.

