Apache

Kafkesc Updates: Docker, __consumer_offsets, byte parsing and Rust

While I haven’t taken the time to blog since the Ksunami announcement, I have been ploughing away at various projects inside the Kafkesc organization, and also continuing the side-objective of growing my Rust skills. So, here is a recap of a few things I have released since. And also, how is it leading to a substantial growth in my Rust knowledge. Ksunami gets an official Docker image In an attempt to make adoption easier, I setup ksunami-docker so that running ksunami can be ever easier; in Docker, Kubernetes or wherever you need. For example: ...

Announcing Ksunami v0.1.x

October this year, while I was in the process of changing job, I started working on an open source project to monitor Kafka consumer lag. At New Relic, a previous gig, we used a lot of Kafka, and we cared equally about monitoring its usage: there are some great articles on New Relic own blogs, published over the years. In the process, I realised that I needed a way to spin up a Kafka cluster for development, and I needed a producer of Kafka records, that was able to behave in accordance to specific scenarios. ...

TFZK - A Terraform Provider for Apache ZooKeeper

Gimme the TL;DR A new Terraform provider is available, designed to interact with ZooKeeper ZNodes: TFZK. The latest stable version is v1.0.3, and you should give it a go. Ah! And here is the doc. OK, I got more time - go ahead! Earlier this year I decided to scratch a long-standing itch: build a Terraform Provider for Apache ZooKeeper. While there was already one, it came with limitations that created issues in production environments: ...

Apache Hadoop on Mac OS X

For some reasons I started to play with Apache Hadoop (Core): Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data. Here’s what makes Hadoop especially useful: Scalable: Hadoop can reliably store and process petabytes. Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. Hadoop implements MapReduce, using theHadoop Distributed File System (HDFS). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters. I followed the Quickstart guide and I can confirm that it works on [en:Mac OS X] too, but I managed only to make it run in “standalone” mode: usefull for first-stage development and debugging. ...