Press "Enter" to skip to content

Curated SQL Posts

Measure Your DBA Skills

Lee Markum has just wrapped up an interesting series:

Over the last 9 weeks I took you on a journey of skills and career topics related to being a SQL Server DBA. We looked at the Production DBA. We saw skills and career topics from the beginning to mid-career to Senior DBA. Then we looked at the Development DBA and their skills and career development needs. Finally there was a wrap up post.

To make it easier for everyone to get to these posts, I decided to bring them all together on a single page.

Click through to get a feeling for where you’re at on the DBA and database developer sides of the house.

Comments closed

New Azure Announcements

Eitan Blumin has a roundup of Azure-related announcements:

On the week of December 7th (especially on December 9th), Microsoft has sent us a whole bag of goodies, announcing the general availability of new features that were only in preview until now, and even newer features that have just entered public preview.

There’s quite a lot to cover here, so let’s try to break it down by categories and provide links for more details. 

Click through for the list.

Comments closed

Using Koalas with Azure Databricks

Tomaz Kastrun continues a series on Azure Databricks:

So far, we looked into SQL, R and Python and this post will be about Python Koalas package. A special implementation of pandas DataFrame API on Apache Spark. Data Engineers and data scientist love Python pandas, since it makes data preparation with pandas easier, faster and more productive. And Koalas is a direct “response” to make writing and coding on Spark, easier and more familiar. Also follow the official documentation with full description of the package.

Click through for a quick demo.

Comments closed

Automating Python Data Pipelines with SQL Agent

Joshua Higginbotham shows an old scheduling dog a new trick:

First off, we need to figure out what server we are going to run these from. For me, it was our SQL Servers dedicated to SSIS. Once this is figured out, we then need to do a custom install of Python. The key here, is to make sure when you install python, you install it across the server itself and not at the user level. Once installed, we can then move to SQL Agent to complete the rest of the work. You’ll need to make sure the service account that you are running SQL Agent with has both permissions to install libraries with python as well as permissions to the directory that your python scripts live. Once permissions are set we can start building out our SQL Agent Job.

Click through for the answer.

Comments closed

Disk Performance Testing in 2020

Glenn Berry gives us some CrystalDiskMark results:

Recently, I built a new AMD mainstream desktop system with some existing parts that I had available. This system has six storage drives, with various levels of technology and performance. I thought it would be interesting to run CrystalDiskMark 7.0.0 on each of these drives. So, here are some quick comparative CrystalDiskMark results in 2020 from those six drives.

This system has a Gigabyte B550 AORUS MASTER motherboard, which is actually a great choice for a B550 motherboard, especially if you want extra storage flexibility. AMD B550 motherboards only have PCIe 4.0 support from the CPU, not from the B550 chipset.

Glenn gets some outstanding performance from one drive and reminds us once more of how beautiful SSD and M.2 drives are.

Comments closed

On-Premises SQL Server is Still Relevant

John Morehouse does not abide by Betteridge’s Law of Headlines:

While I’m a firm believer that the cloud is not a fad and is not going away, it’s just an extension of a tool that we are already familiar with.  The Microsoft marketing slogan is “It’s just SQL” and for the most part that is indeed true.  However, that does not mean that every workload will benefit from being in the cloud.  There are scenarios where it does not make sense to move things to the cloud so let’s take a look at a few of them.

Read on for several reasons why the cloud might not be right for you.

Comments closed

Running Kafka on Windows (via WSL2)

Jim Galasyn shows how you can try out Apache Kafka on Windows:

Is Windows your favorite development environment? Do you want to run Apache Kafka® on Windows? Thanks to the Windows Subsystem for Linux 2 (WSL 2), now you can, and with fewer tears than in the past. Windows still isn’t the recommended platform for running Kafka with production workloads, but for trying out Kafka, it works just fine. Let’s take a look at how it’s done.

You can also get Kafka to run natively on Windows, though there are bugs around file handling, to the point where if you restart your machine while the Kafka service is running, data in partitions may become permanently inaccessible and force you to delete it before you can start Kafka again. So yeah, it’s better to use WSL or Docker containers for trying out Kafka on Windows machines.

Comments closed

Apache Flink 1.12.0 Released

Marta Paes and Aljoscha Krettek announce a new release of Apache Flink:

– The community has added support for efficient batch execution in the DataStream API. This is the next major milestone towards achieving a truly unified runtime for both batch and stream processing.

Kubernetes-based High Availability (HA) was implemented as an alternative to ZooKeeper for highly available production setups.

– The Kafka SQL connector has been extended to work in upsert mode, supported by the ability to handle connector metadata in SQL DDL. Temporal table joins can now also be fully expressed in SQL, no longer depending on the Table API.

– Support for the DataStream API in PyFlink expands its usage to more complex scenarios that require fine-grained control over state and time, and it’s now possible to deploy PyFlink jobs natively on Kubernetes.

Read on for more details on these as well as other changes.

Comments closed

Moving Away from the Lambda Architecture

Xiang Zhang and Jingyu Zhu talk about migrating a project away from the Lambda architecture:

The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods. But it also has some drawbacks, such as complexity and additional development/operational overheads. One of our features for Premium members on LinkedIn, Who Viewed Your Profile (WVYP), relied on a Lambda architecture for some time. The backend system supporting this feature had gone through a few architectural iterations in the past years: it started as a Kafka client processing a single Kafka topic, and eventually evolved to a Lambda architecture with more complicated processing logic. However, in an effort to pursue faster product iteration and lower operational overheads, we recently underwent a transition to make it Lambda-less. In this blog post, we’ll share some of the lessons learned in operating this system in the Lambda architecture, the decisions made in transitioning to Lambda-less, and the shifts necessary to undergo this transition.

When Lambda was first proposed back in 2015, it was intended as a compromise architecture trying to solve several important problems with the tools available in 2015 (well, 2013 and 2014—it was in a book, after all). I could definitely see the architecture fall into disuse within the next decade, not because it was at all bad, but because the world around it changed to the point that there is a better compromise available.

Comments closed