This is Technical Insights Series by Perry Ma | Product Lead, Real-time Compute for Apache Flink at Alibaba Cloud.
Table API, as Flink's declarative API, makes data processing more intuitive and simple. FLIP-11's goal is to enable Table API to handle stream aggregation operations elegantly.
Imagine this scenario: an e-commerce platform needs to track sales statistics for each product in real-time. This requirement seems simple but involves several key challenges:
Before FLIP-11, Table API only supported basic operations like projection, selection, and union. To implement the above statistics functionality, developers often needed to write complex code directly using DataStream API.
Let's visually compare the differences before and after the improvement:
FLIP-11 introduced two main aggregation mechanisms:
This mechanism is mainly used for scenarios requiring grouping by time or row count. For example, "calculate product sales every 5 minutes."
Supports three window types:
This mechanism is better suited for scenarios like "calculate based on current row's surrounding range." For example, "calculate the difference between each order amount and the average of the previous 10 orders."
Let's look at some practical examples:
// Example 1: Calculate product sales every 10 minutes
val result = tableEnv.from("sales")
.window(Tumble over 10.minutes on 'rowtime as 'w)
.groupBy('productId, 'w)
.select('productId, 'price.sum)
// Example 2: Calculate price difference between each order and previous 5 orders
val result = tableEnv.from("orders")
.rowWindow(SlideRows preceding 5.rows following 0.rows as 'w)
.select('orderId,
'price,
'price.avg over 'w as 'avgPrice)
Implementation involves several key points:
Choose Appropriate Window Type:
Time Window Considerations:
State Cleanup:
FLIP-11's implementation makes Flink's Table API more powerful and user-friendly for handling stream aggregation. It not only provides rich aggregation functionality but also achieves a good balance between performance and usability.
This feature was released in Flink 1.4.0 and has been further optimized in subsequent versions. If your project involves stream data aggregation processing, Table API would be an excellent choice.
184 posts | 49 followers
FollowHologres - May 31, 2022
Alibaba Clouder - October 29, 2020
Apache Flink Community - May 9, 2025
Hologres - July 16, 2021
Apache Flink Community China - August 2, 2019
Apache Flink Community China - April 17, 2023
184 posts | 49 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreA fully-managed Apache Kafka service to help you quickly build data pipelines for your big data analytics.
Learn MoreHelp media companies build a discovery service for their customers to find the most appropriate content.
Learn MoreMore Posts by Apache Flink Community