Flink hudi sink Logger; +import org. Contribute to ivi-ru/flink-clickhouse-sink development by creating an account on GitHub. OptionsResolver; +import org. jar 放到了 flink/lib 下，则只需要如下执行即可，否则会出现各种找不到类的异常 bin/sql-client. Multiple Kafka Topics Sink to Multiple Hudi Table in one Flink Job (Support Dynamic Schema Evolution via Apollo) - gaosh96/hudi-0. Environment Description Hudi Jun 2, 2022 · Hudi 0. 0 release and Flink 1. 8. It is working, but compaction is very very slow. DataStream; +import org. SchemaUtils; +import org. BufferUtils; +import org. Why MOR tables need compaction? To understand the FLINK's Sink dynamic table of HUDI, Programmer All, we have been working hard to make a technical sharing website that all programmers love. CompactionOperation field org. Event; +import org. 0 Mar 5, 2025 · But when I submit this job to Flink, it raise error: org. server=false in Flink Hudi sink options, but timeline is still created. LocalZonedTimestampType; +import org. compact. nio. Iceberg currently supports Flink to write data into Iceberg Mar 11, 2025 · Describe the problem you faced Flink + Hudi COW + upsert mode + S3 path -> too slow. x兼容，需开启checkpoint才可写入数据，但存在AbstractMethodError及读取错误，提供Maven依赖及Flink写入Hudi代码示例。 Dec 24, 2021 · Describe the problem you faced org. ClusteringOperator [] - Execut Mar 4, 2025 · 文章详细介绍了HoodieTableFactory如何在Flink中创建动态表源和接收器，包括检查path选项、设置兼容性选项、检查和设置各种配置参数，以及针对不同模式的特定处理。这个工厂类主要用于Hudi与Flink集成，支持数据的写入和读取操作。 Jun 8, 2021 · By Su Shu, Senior Engineer of Tencent's Data Platform department; edited by Lu Peijie (Flink community volunteer) Apache Flink is a prevalent stream-batch computing engine in the big data field. This will cause other impo Oct 29, 2025 · + +import org. 15. commit. 0 and write to hudi through the flink job task for using jar, but when we restart the flink job, an instant exception message appears, and the task cannot return to normal. 0. Apr 21, 2023 · Are you sure your table is MOR, not COW, as per the Javadoc for setPreCombineField(): /** * Sets up the preCombine field into the given configuration {@code conf} * through reading from the hoodie table metadata. 0] 2024-08-29 23:11:28 at org. WriteMetadataEvent; +import org. io. It is a batch job that runs every hour and sinks the last hour data May 5, 2022 · 介绍之前写过Flink CDC sink 到 Iceberg中,本篇主要实践如何CDC到hudi中. IllegalArgumentException: Can not set org. DataChangeEvent; +import org. When async clustering was triggered, the job throw a exception as below: [ERROR] 2025-01-01 01:04:46,869 method:org. Sep 14, 2021 · Hi, you need to use Flink 1. FlinkWriteClients; +import org. runtime. Hudi Integration # To enable the integration of kyuubi flink sql engine and Hudi through Catalog APIs, you need to: Referencing the Hudi dependencies Dependencies # Flink CDC provides several source and sink connectors to interact with external systems. data. engine = SIMPLE Expected behavior A clear and concise description of what you expected to happen. I often encounter checkpoint org. jar:0. This page introduces Flink-Hudi integration. BucketInfo; Oct 8, 2020 · In this way, The sink will receive exactly the num of parallelism results from WriteProcessOperator, regardless of data skew. hoodie the directory tree is as follo Aug 13, 2024 · 文章浏览阅读1. 13. StreamWriteFunction. Feb 23, 2025 · Learn how to integrate Apache Hudi with Apache Flink to build real-time, incremental data lakes. processSchemaChange(MultiTableEventStreamWriteFunction. This led to the rise of solutions based on Iceberg, Hudi, and Delta. cdc. TimerService; +import org. file. CompactionPlanEvent. java:293) 24 more ``` It is being invoked. LoggerFactory; + +import java. DataSinkFactory; +import org. transform. Using Kafka Connect Kafka Connect is a popularly used framework for integrating and moving streaming data between various systems. CompactionUtil; +import org. Jun 8, 2021 · By Su Shu, Senior Engineer of Tencent's Data Platform department; edited by Lu Peijie (Flink community volunteer) Apache Flink is a prevalent stream-batch computing engine in the big data field. 16, run Hive Sync tool on the local box to sync metadata to HMS specify sync-mode as hms and provide required parameters Expected behavior Dec 30, 2023 · 目前，任务分为两种类型：当业务逻辑较为简单时，使用 Flink SQL 进行处理，例如将原始日志或业务库同步至 Hudi 的 ODS 层、进行多表关联和聚合等操作；当业务逻辑比较复杂或需要特殊处 Aug 20, 2023 · 前言如题，记录几个Hudi Flink使用问题，学习和使用Hudi Flink有一段时间，虽然目前用的还不够深入，但是目前也遇到了几个问题，现在将遇到的这几个问题以及解决方式记录一下版本 Flink 1. 什么是hudi? Hudi is a rich platform to build streaming data lakes with incremental data pipelines on Dec 4, 2024 · flink cdc hudi sink什么时候能够支持 #3777 Unanswered lingyoucao asked this question in Q&A Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc]via GitHub May 21, 2025 · I've set hoodie. action. May 2, 2023 · The following sections describes the challenges with the Bloom filter index and how the Hudi-Flink integration efficiently supports high volume streaming workload with Flink’s stateful backend. embed. 564 [] pool-14-thread-1 ERROR flink apache. To respond to the message, please log on to GitHub and use the URL above to go to the specific Nov 6, 2025 · +import org. You do that by running a select statement against the CustomerHudi table, or create a new table with hudi specified for connector. 0] Flink Sql Configs: These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read. flink. WriteFunction#write 在启动任务的时候，根据任务配置的write. Iceberg currently supports Flink to write data into Iceberg Build a modern streaming data architecture using Flink CDC and Apache Hudi This sample demonstrates how using Flink CDC connectors and Apache Hudi we are able to build a modern streaming data lake by only using an Amazon Kinesis Data Analytics Application for Apache Flink. exception. IOException; +import java. To Reproduce Steps to reproduce the behavior: 1. 0, which blocks users from upgrading their Flink-based data pipelines. FlinkException: Global failure triggered by OperatorCoordinator for 'hoodie_stream_write' (operator Dec 30, 2023 · 目前，任务分为两种类型：当业务逻辑较为简单时，使用 Flink SQL 进行处理，例如将原始日志或业务库同步至 Hudi 的 ODS 层、进行多表关联和聚合等操作；当业务逻辑比较复杂或需要特殊处 Aug 20, 2023 · 前言如题，记录几个Hudi Flink使用问题，学习和使用Hudi Flink有一段时间，虽然目前用的还不够深入，但是目前也遇到了几个问题，现在将遇到的这几个问题以及解决方式记录一下版本 Flink 1. When submitting the job, it is found that yarn wil May 26, 2022 · Upserts, Deletes And Incremental Processing on Big Data. hudi. 0, cow + append + clustering mode. May 2, 2023 · As of May 2023, over 100 companies and more than 3600 enterprises use the Hudi-Flink integration in production. - apache/hudi Flink Sql Configs: These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read. 4k次，点赞20次，收藏28次。之所以数据先入 Kafka 而不是直接入 Hudi，是为了实现多个实时任务复用 MySQL 过来的数据，避免多个任务通过 Flink CDC 接 MySQL 表以及 Binlog，对 MySQL 库的性能造成影响。 Feb 27, 2023 · Describe the problem you faced We currently use Hudi version 0. slf4j. WriteMetadataEvent; Oct 29, 2025 · org. Download Flink from the Apache download page. NonThrownExe Apr 9, 2025 · Does setting env. Create a flink jar job that runs fine for a few hours； Sep 3, 2024 · 如果将 hudi-flink-bundle_2. sh embedded Flink sink for Clickhouse. ValidationException: Unable to create a sink for writing table 'default_catalog. Here’s a high-level overview of… Nov 6, 2025 · +import org. FactoryHelper; +import org. ClusteringOperator [] - Execut Preparation when using Flink SQL Client To create Iceberg table in Flink, it is recommended to use Flink SQL Client as it's easier for users to understand the concepts. get_execution_environment(config) env. nio Feb 16, 2022 · Use Maven to introduce the Flink Hudi bundle package, and then run the task. get_execution_environment (config) env. Aug 13, 2024 · 文章浏览阅读1. The Nov 24, 2022 · The Problem I'm trying to write a hudi table into minio s3 bucket by flink SQL, but it fails. timeline. Why MOR tables need compaction? To understand the via GitHub Mon, 03 Nov 2025 17:33:22 -0800 cshuo commented on code in PR #4164: URL: https://github. 0 bundled with Scala 2. We are using EMR 6. RecordConverter; +import org. Aug 24, 2025 · 文章浏览阅读7. Next, let's describe our design. table. functions. CompactionCommitSink. Data Lake is a new technical architecture trending in the cloud era. lang. If you have triaged this as a bug, then file an issue directly. Additionally, if a persistent (not session Aug 11, 2023 · 2023-09-21 06:11:22. com/apache/flink-cdc/pull/4164#discussion_r2486664421 Sep 18, 2024 · 文章浏览阅读1. MemoryPagesExhaustedException; +import org. operation to org. . 1, and i wang to test GLOBAL_BLOOM because I looked at the source code and found it already supported flinksql CREATE TABLE IF NOT EXISTS customer_hud Mar 23, 2023 · Describe the problem you faced A clear and concise description of the problem. Jun 16, 2025 · Flink never supprts any inline table services because the DAG of Flink operators are kind of deterministic during compile time. sh embedded Apr 26, 2022 · 您也可以使用 Hudi Flink 连接器作为 source 连接器，从存储在 Amazon S3 上的 Hudi 数据集读取。为此，您可以对 CustomerHudi 表运行 select 语句，或者为 connector 指定 hudi，创建一个新表。 Jun 21, 2023 · HoodieFlinkStreamer方式相信分析完HoodiePipeline方式，HoodieFlinkStreamer方式也就死一目了然了，其直接使用的是HoodieTableSink#getSinkRuntimeProvider方法中的代码构造DataStream。 Flink DataStream API实现Hudi数据写入官方给了HoodiePipeline方式写入hudi的示例，但是HoodieFlinkStreamer方式给的并不全。下面我们以HoodieFlinkStreamer File Sink # This connector provides a unified Sink for BATCH and STREAMING that writes partitioned files to filesystems supported by the Flink FileSystem abstraction. IntType; +import org. FloatType; +import org. * <p> * This value is non-null as compaction can only be performed on MOR tables. We are observing a lot of issues with few tables. Mar 4, 2025 · 文章详细介绍了HoodieTableFactory如何在Flink中创建动态表源和接收器，包括检查path选项、设置兼容性选项、检查和设置各种配置参数，以及针对不同模式的特定处理。这个工厂类主要用于Hudi与Flink集成，支持数据的写入和读取操作。 Example code to demonstrate issues with Flink incremental query - kkrugler/flink-hudi-query-test Sep 29, 2023 · Our pipeline involves ingesting the changelog from Kafka into Flink and then we finally use the Hudi sink. 12. 4 Hudi 0. AbstractStreamWriteFunction; +import org. RowData; +import org. 11-0. hudi</groupId> <artifactId>hudi-utilities Apr 21, 2024 · Building a real-time streaming pipeline with Apache Hudi, Kinesis, Flink, and S3 involves several steps. factories. What I have noticed is that flink is slow, while running the same process on AWS Apr 16, 2024 · I was creating a Flink SQL stream pipeline in AWS EMR to compact data into a Hudi COW table. FlinkRuntimeException: Exceeded Caused by: java. My pyFlink job was like: env = StreamExecutionEnvironment. 17-bundle-0. Jul 19, 2023 · 在 Flink CDC 中，可以使用 Hudi 作为 Sink，来将 MySQL 数据库的变更数据写入到 Hudi 中。下面是一个简单的示例，展示如何编写 Flink CDC 应用程序，将 MySQL 数据库的整库数据同步到 Hudi 中。 Using Flink CDC Ingestion CDC (change data capture) keep track of the data changes evolving in a source system so a downstream process or system can action that change. BucketInfo; Nov 6, 2025 · + +import org. streaming. default_database. SmallIntType; Nov 26, 2018 · Minio with Apache Flink Apache Flink supports three different data targets in its typical processing flow — data source, sink and checkpoint target. Both the Flink and Hudi communities continue to strengthen the integration, with contributors from around the world. We have a flink task that consumes kafka messages and then writes them into the hudi table, using MOR table, index using buecket Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc]via GitHub Re: [PR] Add Hudi sink connector support [flink-cdc] via GitHub 使用Flink sink 数据到 Hive外表所在的HDFS路径下. datastream. Hudi provides a sink for Kafka Connect, that can ingest/stream records from Apache Kafka to Hudi Tables. RecordData; +import org. TableId; By using Kyuubi, we can run SQL queries towards Hudi which is more convenient, easy to understand, and easy to expand than directly using flink to manipulate Hudi. Mar 17, 2025 · I'm trying to use Flink-cdc to capture data change from Mysql and update the Hudi table in S3. Contribute to apaxyx/FlinkSinkHDFS development by creating an account on GitHub. 1 for master branch. 9k次，点赞41次，收藏113次。本文介绍如何使用FlinkCDC将MySQL数据同步至Hudi，并实现湖仓一体化方案。覆盖Hudi的基本概念、存储机制及FlinkCDC的集成实践。 Nov 3, 2025 · at org. v2. When combined with efficient data management in Amazon S3, Hudi lets you ingest and update data in real time. StreamerUtil; +import org. org. LogicalType; +import org. operator. FlinkException: Global failure triggered by OperatorCoordinator for 'hoodie_stream_write' (operator May 30, 2023 · Write hudi tables using Flink 1. nio Oct 29, 2025 · +import org. util. 0 with Hudi version 13. HoodieException; +import org. OperatorIDGenerator; +import org. report errors <dependency> <groupId>org. CompactOperator 140 Executor executes action [Execute compaction for instant 20230917184914418 from task 0] error Steps to reproduce the behavior: build hudi flink stuff from master 66f9513 CREATE TABLE t1 ( uuid VARCHAR(20), -- you can use 'PRIMARY KEY NOT ENFORCED' syntax to mark the field as record key name Jun 2, 2022 · Hudi 0. ClusteringUtil; +import org. FunctionSnapshotContext; +import org. Contribute to apache/flink-cdc development by creating an account on GitHub. Serializable; +import java. java:125) ~ [hudi-flink1. 0 and Flink version 1. While data source and sink are fairly obvious, checkpoint target is used to persist states at certain intervals, during processing, to guard against data loss and recover consistently from a failure of nodes. To Reproduce Steps to reproduce the behavior: Try to use Flink SQL to sink data to Hudi and m May 10, 2023 · To Reproduce I am using flink 1. To do so, while providing the same transactional features the sink implements transaction co-ordination across the tasks and workers in the Kafka Connect Jun 7, 2025 · org. Why I'm disabling it is because AWS Managed Flink doesn't support it: Jan 10, 2022 · Hudi MOR表以行的形式存储消息，支持保留所有更改日志 (格式级集成)。所有的更新日志记录可以使用Flink流阅读器。 via GitHub Mon, 03 Nov 2025 06:24:22 -0800 voonhous commented on code in PR #4164: URL: https://github. As well on how to manage AWS Lake Formation when working with KDA Studio. type = BUCKET hoodie. connectors. Configuration; Nov 6, 2025 · +import org. Understand stream write architecture, Flink sink configuration, and best practices for low-latency processing. "I use Flink cdc to read MySQL data, and then write it to S3 through hudi. clustering. StreamWriteOperatorCoordinator; +import org. SchemaChangeEvent; +import org. Correspondent; +import org. function. 0-my Feb 20, 2023 · EMR Flink会结合AWS Glue服务，所有S3 Hudi表的元数据均存储于Glue Catalog，可以方便用户进行元数据的管理，包括查询、更改、删除等。 Mar 9, 2023 · Describe the problem you faced When running a Flink workflow that writes to a Hudi table, metaspace is leaked whenever the job restarts from a checkpoint. util Oct 24, 2021 · 这个类是Flink SQL创建Table Sink和Source的入口类。本篇我们从这个类开始，分析 HoodieTableSink 的创建过程。创建TableSink的入口方法逻辑如下： Nov 6, 2025 · +import org. May 27, 2025 · However, current versions of Apache Hudi do not yet support Flink 2. index. Iceberg uses Scala 2. The advantage is that it does not rely on message queues, but the This page introduces Flink-Hudi integration. FlinkOptions; +import org. Even MOR is very slow. getCheckpointConfig (). Hudi maintains metadata of all of the operations that you run on the dataset, so all of the actions remain atomic and consistent. Flink. com/apache/flink-cdc/pull/4164#discussion_r2488338816 Jun 21, 2023 · HoodieFlinkStreamer方式相信分析完HoodiePipeline方式，HoodieFlinkStreamer方式也就死一目了然了，其直接使用的是HoodieTableSink#getSinkRuntimeProvider方法中的代码构造DataStream。 Flink DataStream API实现Hudi数据写入官方给了HoodiePipeline方式写入hudi的示例，但是HoodieFlinkStreamer方式给的并不全。下面我们以HoodieFlinkStreamer Nov 24, 2022 · The Problem I'm trying to write a hudi table into minio s3 bucket by flink SQL, but it fails. The table emits around 10K upserts per second (a huge number of updates). Because of S3 slowdown errors that occasionally happened during Hudi writes, I tried to turn on the metadata table to eliminate S3 file listing but ran into the following exception saying S3 FS doesn't May 23, 2024 · 读取优化视图 -输入格式仅选择压缩的列式文件 parquet文件查询性能 500 GB的延迟时间约为30分钟导入现有的Hive表近实时视图混合、格式化数据约1-5分钟的延迟提供近实时表增量视图数据集的变更启用增量拉取 Hudi 存储层由三个不同的部分组成元数据 –它以时间轴的形式维护了在数据集上 Flink CDC is a streaming data integration tool. 12 when compiling the Apache iceberg-flink-runtime jar, so it's recommended to use Flink 2. -- This is an automated message from the Apache Git Service. connector. hoodie the directory tree is as follo Flink sink for Clickhouse. 5k次，点赞15次，收藏12次。处理数据的读取方式-输入格式仅选择压缩的列式文件Hudi存储层由三个不同的部分组成–它以时间轴的形式维护了在数据集上执行的所有操作的元数据，该时间轴允许将数据集的即时视图存储在基本路径的元数据目录下。时间轴上的操作类型包括。_flink实现 Feb 27, 2025 · I have written a pipeline in which I am sinking the data from Kafka to Hudi-S3. Oct 29, 2025 · +import org. WriteMetadataEvent; Nov 6, 2025 · +import org. TableId; Aug 15, 2023 · Hudi文件的compact过程，对于异步compact过程，Flink会在提交后去生成compact的执行plan，然后将plan序列化并持久化到Hudi表目录下，最后通过新的算子去读取Hudi表目录下的plan并执行compact，最后完成compact后进行commit操作，至此Flink完整的写入流程已经梳理完成了。 Mar 1, 2022 · You may also use a Hudi Flink connector as a source connector to read from a Hudi dataset stored on Amazon S3. model. logical. Configuration; +import org. apache. * Of which, MOR tables will have non-null precombine fields. We recommend two ways for syncing CDC data into Hudi: Using the Ververica flink-cdc-connectors directly connect to DB Server to sync the binlog data into Hudi. Nearly taking half an hour for 13k records only. setMaxConcurrentCheckpoints (3); potentially cause exceptions in Hudi writes? Flink Hudi sink configuration: table. SchemaEvolutionClient; +import org. Compaction Background Compaction is a table service employed by Hudi specifically in Merge On Read (MOR) tables to merge updates from row-based log files to the corresponding columnar-based base file periodically to produce a new version of the base file. 2 for Hudi 0. 0 流写流写Hudi,必须要开启Checkpoint，这个我在之前的文章:Flink SQL Checkpoint 学习总结提到过。如果不 Jan 1, 2025 · Describe the problem you faced I'm using a flink job to append data into a cow table. CompactionOperation Dec 8, 2023 · Describe the problem you faced hudi sink job cannot restart normally from checkpoint because of InvalidAvroMagicException (CleanFunction) This sink job does data writing and generates compaction pl Compaction Background Compaction is a table service employed by Hudi specifically in Merge On Read (MOR) tables to merge updates from row-based log files to the corresponding columnar-based base file periodically to produce a new version of the base file. type = MOR hoodie. schema. RowType; +import org. The case Flink does not support async clustering simultaniously with compaction is because of the same reason. java:114) ~ [hudi-flink1. PipelineOptions; + +import org. Oct 29, 2025 · + +import org. operation参数 (默认是UPSERT)确定不同的WriteFunction。 Aug 12, 2022 · This article discusses the basics of Apache Hudi, Flink Hudi integration, and use cases. This page introduces Flink-Hudi integration. hudi_table'. configuration. BucketInfo; Jan 10, 2025 · 文章浏览阅读944次，点赞21次，收藏9次。Hudi存储层由三个不同的部分组成–它以时间轴的形式维护了在数据集上执行的所有操作的元数据，该时间轴允许将数据集的即时视图存储在基本路径的元数据目录下。时间轴上的操作类型包括。_flink hudi Nov 6, 2025 · +import org. sink. 0与Flink 1. OperatorEventGateway; +import org. state. MultiTableWriteOperator; +import org. When aysnc clustering job scheduled, will throw exception: 2023-05-11 11:06:35,604 ERROR org. ProcessFunction; Nov 6, 2025 · +import org. Files; +import java. This filesystem connector provides the same guarantees for both BATCH and STREAMING and it is an evolution of the existing Streaming File Sink which was designed for providing exactly-once semantics for STREAMING execution. The design of Flink’s job DAG about writing into Hudi The above illustrates some of the design considerations in the process of integrating Flink with Hudi. As the picture shows: A line of estimated size log will be printed every time 100 pieces of data are processed. hudi</groupId> <artifactId>hudi-utilities Jan 10, 2025 · 文章浏览阅读944次，点赞21次，收藏9次。Hudi存储层由三个不同的部分组成–它以时间轴的形式维护了在数据集上执行的所有操作的元数据，该时间轴允许将数据集的即时视图存储在基本路径的元数据目录下。时间轴上的操作类型包括。_flink hudi Nov 6, 2025 · +import org. Oct 17, 2023 · Apache Hudi是下一代流数据湖平台，集成仓库和数据库功能，支持Flink、Spark等引擎写入数据。本文介绍了基于Flink DataStream API的两种写入方式：HoodiePipeline和HoodieFlinkStreamer，均通过动态表实现，灵活适应不同业务场景。 Dec 6, 2021 · HI , i am use flink to consume data from Kafka, then slide out according to different business types, and store the data in different hudi tables. pipeline. common. sink2. 什么是hudi? Hudi is a rich platform to build streaming data lakes with incremental data pipelines on Dec 4, 2024 · flink cdc hudi sink什么时候能够支持 #3777 Unanswered lingyoucao asked this question in Q&A Sep 29, 2023 · Our pipeline involves ingesting the changelog from Kafka into Flink and then we finally use the Hudi sink. 9. bucket. Compaction is not applicable to Copy On Write (COW) tables and only applies to MOR tables. RowDataUtils; +import org. 11. 16. TableId; [PR] Add Hudi sink connector support [flink-cdc] Posted to issues@flink. CreateTableEvent; +import org. types. 0 流写流写Hudi,必须要开启Checkpoint，这个我在之前的文章:Flink SQL Checkpoint 学习总结提到过。如果不 Mar 13, 2023 · Flink sink relies the ckp success event for Hudi trasanction commiting; Both bulk_insert and append_write use the BulkInsertWriterHelper to write the parquet files direcly, there is no UPSERTs, if FLINK_STATE is used, things are very diffrent, the StreamWriteFunction would kick in; Nov 17, 2022 · i use flink to build hudi demo in version 0. After, I go over the purpose of the stream reader and how Flink has built-in support for automatic recovery in case the stream reader fails. The hudi table is created, but only contains meta data diretory . 0 Apr 21, 2023 · Are you sure your table is MOR, not COW, as per the Javadoc for setPreCombineField(): /** * Sets up the preCombine field into the given configuration {@code conf} * through reading from the hoodie table metadata. Jul 6, 2025 · Apache Hudi provides seamless integration with Apache Flink, enabling both batch and streaming workloads to leverage Hudi's capabilities. Iceberg currently supports Flink to write data into Iceberg Flink Sql Configs: These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read. We can feel the unique charm of how Flink brings in the power of streaming into Hudi. ConfigOption; +import org. This documentation covers the architecture, reading and writing capabilities, configuration options, and advanced features of Hudi's Flink integration. org Aug 15, 2023 · Hudi文件的compact过程，对于异步compact过程，Flink会在提交后去生成compact的执行plan，然后将plan序列化并持久化到Hudi表目录下，最后通过新的算子去读取Hudi表目录下的plan并执行compact，最后完成compact后进行commit操作，至此Flink完整的写入流程已经梳理完成了。 May 6, 2023 · Describe the problem you faced I am using flink sink hudi. FunctionInitializationContext; +import org. event. Mar 22, 2023 · Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi. invoke (CompactionCommitSink. commitIfNecessary (CompactionCommitSink. You can use these connectors out-of-box, by adding released JARs to your Flink CDC environment, and specifying the connector in your YAML pipeline definition. Mar 1, 2022 · You may also use a Hudi Flink connector as a source connector to read from a Hudi dataset stored on Amazon S3. SchemaEvolutionClient; Nov 6, 2025 · +import org. Schema; +import org. coordination. via GitHub Mon, 03 Nov 2025 06:24:22 -0800 voonhous commented on code in PR #4164: URL: https://github. 6 and hudi 0. Jan 29, 2024 · Describe the problem you faced Using Flink SQL it's not possible to change Database Name for Hive Sync. WithPreWriteTopology; +import org. MultiTableEventStreamWriteFunction. The current Flink integration in Hudi relies on APIs and internal behaviors that may have changed or been deprecated in Flink 2. operators. api. Logger; Aug 29, 2024 · 2024-08-29 23:11:28 at org. utils.