Press "Enter" to skip to content

Category: Versions

spkarlyr 1.6 Released

Carly Driggers announces a new release of sparklyr:

Sparklyr, an LF AI & Data Foundation Incubation Project, has released version 1.6! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

Click through to see the changes.

Comments closed

Kafka Sans ZooKeeper

Ben Stopford and Ismael Juma give us a preview:

So we’re very pleased to say that the early access of the KIP-500 code has been committed to trunk and is expected to be included in the upcoming 2.8 release. For the first time, you can run Kafka without ZooKeeper. We call this the Kafka Raft Metadata mode, typically shortened to KRaft (pronounced like craft) mode.

Beware, there are some features that are not available in this early-access release. We do not yet support the use of ACLs and other security features or transactions. Also, both partition reassignment and JBOD are unsupported in KRaft mode (these are anticipated to be available in an Apache Kafka release later in the year). Hence, consider the quorum controller experimental software—we don’t advise subjecting it to production workloads. If you do try out the software, however, you’ll find a host of new advantages: It’s simpler to deploy and operate, you can run Kafka in its entirety as a single process, and it can accommodate significantly more partitions per cluster (see measurements below).

Read on for more information. This is a big deal for Kafka.

Comments closed

Using Query Store to Track Regressions after Upgrades

Grant Fritchey has another use for Query Store:

There are a lot of uses for Query Store, but one of the most interesting is as an upgrade tool. We all know that upgrades in SQL Server can be more than a little bit nerve wracking. No matter how much you tested stuff in lower environments, deploying an update to production might result in performance issues as your code hits a regression. This is even more true when upgrading from versions of SQL Server prior to 2014 to anything 2014 and above. That’s because of the new cardinality estimation engine introduced in 2014. Most queries won’t notice it. Some queries will benefit from the better estimates. A few, problematic, queries will suffer. This is where Query Store can be used as an upgrade tool.

Read on to learn how.

Comments closed

Columnstore in Standard Edition

Erik Darling looks at how powerful (or not) columnstore indexes are in SQL Server Standard Edition:

The top plan is from Standard Edition, and runs for a minute in a full serial plan. There is a non-parallel plan reason in the operator properties: MaxDOPSetToOne.

I do not have DOP set to one anywhere, that’s just the restriction kicking in. You can try it out for yourself if you have Standard Edition sitting around somewhere. I’m doing all my testing on SQL Server 2019 CU9. This is not ancient technology at the time of writing.

The bottom plan is from Enterprise/Developer Edition, where the the plan is able to run partially in parallel, and takes 28 seconds (about half the time as the serial plan).

You get what you pay for in this case.

Comments closed

Apache Spark 3.1 Released

Hyukjin Kwon, et al, announce Apache Spark 3.1:

Various new SQL features are added in this release. The widely used standard CHAR/VARCHAR data types are added as variants of the supported String types. More built-in functions (e.g., width_bucket (SPARK-21117) and regexp_extract_all (SPARK-24884) were added. The current number of built-in operators/functions has now reached 350. More DDL/DML/utility commands have been enhanced, including INSERT (SPARK-32976), MERGE (SPARK-32030) and EXPLAIN (SPARK-32337). Starting from this release, in Spark WebUI, the SQL plans are presented in a simpler and structured format (i.e. using EXPLAIN FORMATTED)

There have been quite a few advancements around the SQL side.

Comments closed

The Editions of Powershell

Jeffrey Hicks gives us an update on the Powershell landscape:

The PowerShell community is beginning another year in the world of PowerShell 7. Most of you know what that means. However, there are newcomers to our community practically every day. Or I know there are occasional or reluctant users who might not pay enough attention to understand the world of PowerShell as it stands today. I wrote this post as a kind of virtual sticky note for the PowerShell community. Feel free to reference this post in your own work so that you don’t have to explain or define “Windows PowerShell” and “PowerShell”.

Click through to learn how to differentiate the two.

Comments closed

Security Update for SQL Server

Randolph West takes a look at a patch:

Microsoft announced updates today for all supported versions of SQL Server, for a privilege escalation vulnerability that leverages Extended Events. For security reasons no further details have been provided, but you can expect more information in the near future, now that this update is public.

Be sure to grab the latest update for your version of SQL Server.

Comments closed

December 2020 SQL Tools Releases

Drew Skwiers-Koballa gives us an update on where SQL Server tooling is at:

The December releases of Azure Data Studio 1.25 and SQL Server Management Studio (SSMS) 18.8 are now generally available.  Additionally, the mssql extension for Visual Studio Code has recently been updated to version 1.10.0. Read on to learn more about each of these updates and grab the latest versions of SSMS, Azure Data Studio, or the mssql extension for VS Code.

Read on to learn more.

Comments closed

Apache Flink 1.12.0 Released

Marta Paes and Aljoscha Krettek announce a new release of Apache Flink:

– The community has added support for efficient batch execution in the DataStream API. This is the next major milestone towards achieving a truly unified runtime for both batch and stream processing.

Kubernetes-based High Availability (HA) was implemented as an alternative to ZooKeeper for highly available production setups.

– The Kafka SQL connector has been extended to work in upsert mode, supported by the ability to handle connector metadata in SQL DDL. Temporal table joins can now also be fully expressed in SQL, no longer depending on the Table API.

– Support for the DataStream API in PyFlink expands its usage to more complex scenarios that require fine-grained control over state and time, and it’s now possible to deploy PyFlink jobs natively on Kubernetes.

Read on for more details on these as well as other changes.

Comments closed