This is Technical Insights Series by Perry Ma | Product Lead, Real-time Compute for Apache Flink at Alibaba Cloud.
Imagine ordering at a restaurant. In a traditional approach, after taking your order, the waiter would stand at your table waiting for the kitchen to prepare the food before serving other customers. This is obviously inefficient, as the waiter could be serving other customers during the waiting time. This FLIP is like teaching Flink to work like a "multi-threaded waiter"!
In stream processing, it's common to access external systems to enrich data streams. For example:
Traditional synchronous access creates a major problem: the entire processing slows down while waiting for external system responses. It's like a waiter having to wait for one dish to be prepared before serving other customers, which is very inefficient.
The above diagram shows the difference between synchronous and asynchronous processing: synchronous processing is like a "serial" process, while asynchronous processing allows multiple requests simultaneously, greatly improving efficiency.
The async I/O design is like designing an efficient restaurant service system, including these key roles:
Like how restaurants can choose different serving methods, async I/O provides two modes:
Mode | Characteristics | Suitable Scenarios |
---|---|---|
Ordered | Strictly outputs in data arrival order | Scenarios requiring ordered data processing, like sequential event processing |
Unordered | Outputs as soon as processing completes | Scenarios without strict order requirements, pursuing maximum throughput |
Suppose we need to query HBase database in a data stream. Here are two common usage patterns:
public class HBaseAsyncFunction implements AsyncFunction<String, String> {
private transient Connection connection;
@Override
public void asyncInvoke(String key, AsyncCollector<String> collector) {
Get get = new Get(Bytes.toBytes(key));
Table table = connection.getTable(TableName.valueOf("test"));
((AsyncableHTableInterface) table).asyncGet(get,
new MyCallback(collector));
}
}
// Create data stream
DataStream<String> stream = AsyncDataStream.unorderedWait(
inputStream,
new HBaseAsyncFunction(),
100, // Timeout (milliseconds)
20 // Maximum concurrent requests
);
public class HBaseAsyncFunction implements AsyncFunction<String, String> {
private transient Connection connection;
@Override
public void asyncInvoke(String key, AsyncCollector<String> collector) {
Get get = new Get(Bytes.toBytes(key));
Table table = connection.getTable(TableName.valueOf("test"));
ListenableFuture<Result> future = table.asyncGet(get);
Futures.addCallback(future,
new FutureCallback<Result>() {
public void onSuccess(Result result) {
collector.collect(
Collections.singletonList(
Bytes.toString(result.getValue(
Bytes.toBytes("f"),
Bytes.toBytes("c")
))
)
);
}
public void onFailure(Throwable t) {
collector.collect(t);
}
}
);
}
}
Performance Tuning:
This FLIP was implemented in Flink 1.2. It significantly improved Flink's efficiency in processing external data, especially in scenarios requiring frequent access to external systems. The feature is now widely used in production environments for:
FLIP-12 brought efficient async I/O processing capability to Flink, like introducing a professional order system to a restaurant, allowing each waiter to efficiently handle multiple orders. By transforming synchronous operations into asynchronous ones, it greatly improved processing efficiency while maintaining code simplicity and maintainability. This improvement makes Flink more capable in handling complex data streams, especially in scenarios requiring frequent interaction with external systems.
184 posts | 48 followers
FollowApache Flink Community - March 26, 2025
Apache Flink Community - May 7, 2024
Apache Flink Community - April 9, 2024
Apache Flink Community - June 28, 2024
Apache Flink Community China - December 25, 2019
Apache Flink Community China - July 3, 2023
184 posts | 48 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Apache Flink Community