JD Case Study - AutoMQ

"AutoMQ's cloud-native architecture perfectly aligns with JD's strategy of running core infrastructure on Kubernetes. By offloading Kafka's data durability to CubeFS, we not only solved the severe storage and network redundancy issues inherent in traditional architectures but also achieved true second-level elasticity. This allows us to effortlessly handle e-commerce traffic floods while significantly reducing infrastructure costs."

Hou Zhong

Cloud Native Architect for Kafka

JD.com

The Challenge

Severe Storage and Network Resource Waste

JD.com's JDQ platform utilizes CubeFS (an S3-compatible object storage) as its underlying storage. Traditional Apache Kafka uses the ISR multi-replica mechanism for durability, and CubeFS also uses its own multi-replica mechanism. When combined, a single write resulted in 9 actual copies of data (Kafka replicas × CubeFS replicas). This meant approximately 66.67% of storage space was wasted on unnecessary redundancy. Additionally, Kafka-level replication consumed excessive network bandwidth, driving up total costs.

Lack of Cloud-Native Elasticity

Apache Kafka's "Shared-Nothing" architecture couples compute (Brokers) with local storage, making it difficult to run flexibly on Kubernetes. Scaling operations required a complex manual process: calculating partition strategies, evaluating impact, and executing data migration during off-peak hours. This process could take hours, making it impossible to utilize Kubernetes for automatic scaling to cope with the dynamic traffic peaks typical of e-commerce.

Why AutoMQ

S3-Based Storage-Compute Separation (Adapting to CubeFS)

AutoMQ decouples compute from storage and supports the standard S3 API, allowing seamless integration with JD's internal CubeFS. It offloads data durability entirely to CubeFS/S3. Unlike Kafka's ISR, AutoMQ uses a Single Leader Partition design. Data written to the Broker generates only the necessary replicas at the CubeFS layer (3 copies instead of 9), drastically reducing storage consumption and saving network bandwidth previously used for replication.

Kubernetes Native

By making Brokers stateless, AutoMQ eliminates the need for physical data copying during scaling. Partition reassignment involves only metadata updates and completes in seconds. The built-in Self-Balancing component continuously optimizes cluster traffic, and the architecture integrates perfectly with Kubernetes Autoscalers (like Karpenter), enabling automatic scaling based on load.

100% Compatibility for Risk-Free Migration

Given JD's massive Kafka ecosystem (supporting 1,400+ business lines), compatibility was crucial. AutoMQ is 100% compatible with Apache Kafka protocols, ensuring that existing business applications could migrate seamlessly without any code changes or configuration modifications.

The Results

Storage Costs Reduced by 50%+, Bandwidth Costs Reduced by 33%+

By eliminating the "double redundancy" of storage replicas and reducing inter-broker replication traffic, JD.com achieved a substantial reduction in storage and network bandwidth resource requirements, directly translating to lower infrastructure costs.

Cluster Scaling Efficiency Reduced from Hours to Seconds

The efficiency of scaling on Kubernetes improved dramatically, with scaling times dropping from hours to seconds. The AutoMQ cluster can now dynamically and quickly adjust capacity to handle massive traffic surges during events like "618" or "Double 11" sales. This not only reduced the operational burden on the team but also eliminated the waste associated with over-provisioning resources for peak loads.

Key Achievements

50%+

Storage cost reduction

33%+

Network bandwidth cost reduction

3x Replica

Reduction from 9 to 3 copies

Seconds

Scaling time vs. hours before

Ready to transform your streaming infrastructure?

See how AutoMQ can help you achieve similar results. Get a personalized demo and pricing comparison.

Try Demo Talk to an Expert

Case Study

Why JD.com Replaces Kafka with AutoMQ at 100 GiB/s Scale