Press "Enter" to skip to content

Category: Versions

Apache Kafka 2.8 Released

John Roesler announces Apache Kafka 2.8:

We are excited to announce that 2.8 introduces an early-access look at Kafka without ZooKeeper! The implementation is not yet feature complete and should not be used in production, but it is possible to start new clusters without ZooKeeper and go through basic produce and consume use cases.

At a high level, KIP-500 works by moving topic metadata and configurations out of ZooKeeper and into a new internal topic named @metadata. This topic is managed by an internal Raft quorum of “controllers” and is replicated to all brokers in the cluster. The leader of the Raft quorum serves the same role as the controller in clusters today. A node in the KIP-500 world can serve as a controller, a broker, or both, depending on the new process.roles configuration. See the README for quickstart instructions and additional details.

In addition to the headline item, there are plenty of other bugfixes and additions as well.

Leave a Comment

spkarlyr 1.6 Released

Carly Driggers announces a new release of sparklyr:

Sparklyr, an LF AI & Data Foundation Incubation Project, has released version 1.6! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

Click through to see the changes.

Comments closed

Kafka Sans ZooKeeper

Ben Stopford and Ismael Juma give us a preview:

So we’re very pleased to say that the early access of the KIP-500 code has been committed to trunk and is expected to be included in the upcoming 2.8 release. For the first time, you can run Kafka without ZooKeeper. We call this the Kafka Raft Metadata mode, typically shortened to KRaft (pronounced like craft) mode.

Beware, there are some features that are not available in this early-access release. We do not yet support the use of ACLs and other security features or transactions. Also, both partition reassignment and JBOD are unsupported in KRaft mode (these are anticipated to be available in an Apache Kafka release later in the year). Hence, consider the quorum controller experimental software—we don’t advise subjecting it to production workloads. If you do try out the software, however, you’ll find a host of new advantages: It’s simpler to deploy and operate, you can run Kafka in its entirety as a single process, and it can accommodate significantly more partitions per cluster (see measurements below).

Read on for more information. This is a big deal for Kafka.

Comments closed

Using Query Store to Track Regressions after Upgrades

Grant Fritchey has another use for Query Store:

There are a lot of uses for Query Store, but one of the most interesting is as an upgrade tool. We all know that upgrades in SQL Server can be more than a little bit nerve wracking. No matter how much you tested stuff in lower environments, deploying an update to production might result in performance issues as your code hits a regression. This is even more true when upgrading from versions of SQL Server prior to 2014 to anything 2014 and above. That’s because of the new cardinality estimation engine introduced in 2014. Most queries won’t notice it. Some queries will benefit from the better estimates. A few, problematic, queries will suffer. This is where Query Store can be used as an upgrade tool.

Read on to learn how.

Comments closed

Columnstore in Standard Edition

Erik Darling looks at how powerful (or not) columnstore indexes are in SQL Server Standard Edition:

The top plan is from Standard Edition, and runs for a minute in a full serial plan. There is a non-parallel plan reason in the operator properties: MaxDOPSetToOne.

I do not have DOP set to one anywhere, that’s just the restriction kicking in. You can try it out for yourself if you have Standard Edition sitting around somewhere. I’m doing all my testing on SQL Server 2019 CU9. This is not ancient technology at the time of writing.

The bottom plan is from Enterprise/Developer Edition, where the the plan is able to run partially in parallel, and takes 28 seconds (about half the time as the serial plan).

You get what you pay for in this case.

Comments closed

Apache Spark 3.1 Released

Hyukjin Kwon, et al, announce Apache Spark 3.1:

Various new SQL features are added in this release. The widely used standard CHAR/VARCHAR data types are added as variants of the supported String types. More built-in functions (e.g., width_bucket (SPARK-21117) and regexp_extract_all (SPARK-24884) were added. The current number of built-in operators/functions has now reached 350. More DDL/DML/utility commands have been enhanced, including INSERT (SPARK-32976), MERGE (SPARK-32030) and EXPLAIN (SPARK-32337). Starting from this release, in Spark WebUI, the SQL plans are presented in a simpler and structured format (i.e. using EXPLAIN FORMATTED)

There have been quite a few advancements around the SQL side.

Comments closed

The Editions of Powershell

Jeffrey Hicks gives us an update on the Powershell landscape:

The PowerShell community is beginning another year in the world of PowerShell 7. Most of you know what that means. However, there are newcomers to our community practically every day. Or I know there are occasional or reluctant users who might not pay enough attention to understand the world of PowerShell as it stands today. I wrote this post as a kind of virtual sticky note for the PowerShell community. Feel free to reference this post in your own work so that you don’t have to explain or define “Windows PowerShell” and “PowerShell”.

Click through to learn how to differentiate the two.

Comments closed