May 2021 – Page 4 – Curated SQL

One…Million IO Requests

Published 2021-05-24 by Kevin Feasel

If, somehow, you’ve managed to see this error in your errorlog then congratulations, you’ve won an instance of SQL Server that probably won’t be doing much.
I found out about this message a few months ago, but it has been in the product for years and I went this long without ever even knowing it existed (congrats me!) until I was asked about it and coincidentally ended up finding it in an errorlog the same week. Clearly, I have too much fun packed into my weeks. I asked around, only one other person had ever found this in an errorlog before… that’s either impressive, depressing, or some perfect quantity of both – mellow it out to a smooth melancholy.

Click through to see more information about the 1000000 IO error message and when you might find it.

Comments closed

Storing SQL Server Connection Strings with Powershell Secret Management

Published 2021-05-24 by Kevin Feasel

Max Trinidad can keep a secret:

Finally, I came up with a practical example using the Powershell Secret Management module for storing SQL credentials. This is an excellent way of keeping your SQL connection strings information out of your scripting code. This way we just have it stored in our Vault.

Read on for a walkthrough and demonstration.

Comments closed

Enforcing Powershell Named Parameters

Published 2021-05-24 by Kevin Feasel

Dale Hirt is the law:

Line 3 is a named parameter called $badParam. This becomes important a little later. Lines 4, 5, 6, and 7 are named parameters. Now, how can we enforce that someone uses those named parameters.

Read on for an interesting technique to ensure that your callers are using named parameter rather than positional parameter calls.

Comments closed

Finding Duplicates in a Spark DataFrame

Published 2021-05-21 by Kevin Feasel

The Hadoop in Real World team shows how to deduplicate rows in a DataFrame in Spark:

It is a pretty common use case to find the list of duplicate elements or rows in a Spark DataFrame and it is very easy to do with a groupBy() and a count()

Where “easy” has as a modifier just how many columns you’re dealing with in the DataFrame.

Comments closed

Comparing Fluentd to Logstash

Published 2021-05-21 by Kevin Feasel

Ajit Chelat compares two popular log agents:

Log collectors, or aggregators, are critical aspects of the log management infrastructure. They help collect logs from various systems and parse and groom them for ingestion into a monitoring or observability tool for further visualization and analysis. DevOps and SRE teams are quickly adding log collectors to their toolchain. With millions of users across domains, two log collectors have risen to the forefront of log collection—Fluentd and Logstash.
This article compares the two and sees which one is the best for your log management and analysis initiatives—Fluentd vs. Logstash.

Click through for the round-by-round comparison and see which one comes out on top in your scenario.

Comments closed

Ignoring Backups in the SQL Server Error Log

Published 2021-05-21 by Kevin Feasel

Garry Bargsley has a solution to an annoyance:

Whether you are new to SQL Server or a seasoned veteran, you will notice odd behavior in the SQL Server Error Log. When a database backup is performed, an entry is put into the SQL Error Log. The SQL Server team decided to log successful backup messages to the Error Log. If you ask most technology professionals, you will find that logging successful events are not really a common occurrence. This behavior causes a bloated Error Log that can make it hard to find what you need quickly.
Luckily, that same SQL Server team built in a solution to this situation.

Read on to see what the solution is, as well as how to use it.

Comments closed

Executing GitHub Actions via CLI

Published 2021-05-21 by Kevin Feasel

Kevin Chant uses the GitHub CLI:

In this post I want to share some advice about using GitHub CLI with GitHub Actions for Data Platform deployments. Because I showed that at SQLDay last week.
For those who were not aware, there is a GitHub CLI you can use from the command line. You can download GitHub CLI from here.
Anyway, GitHub CLI was recently updated to support commands for GitHub Actions. GitHub Actions is the CI/CD mechanism that is now available in GitHub. Which I have covered in a few other posts, including the one you can find by clicking here.

Click through to learn more.

Comments closed

Temporal Table Performance Scenarios

Published 2021-05-21 by Kevin Feasel

Hugo Kornelis continues a series on temporal table performance:

Welcome to part eighteen of the plansplaining series. Like the previous posts, this one too focuses on temporal tables and their effect on the execution plan. After looking at data modifications in temporal tables and at querying with a most basic temporal form of temporal query, let’s look at the more advanced variations for temporal querying.
We’re still looking at getting data from a single query only in this post. We’ll look at joins in the next post.

Click through for these scenarios.

Comments closed

sqltop — SQL Server Process Viewer

Published 2021-05-21 by Kevin Feasel

Mark Wilkinson has a big announcement:

Hey folks! I’m proud to announce the first open source release of my sqltop tool! sqltop is an interactive command-line based tool to view active sessions on a SQL Server instance. In this post I’ll talk about why I wrote the tool, why I chose to write it in PowerShell, and walk through some of the challenges I faced during development.

I’ve had a chance to see this in action and it’s really cool. I’m glad Mark was able to get this open-sourced, so go check it out.

Comments closed

Running Dask on AKS

Published 2021-05-20 by Kevin Feasel

Tsuyoshi Matsuzaki sets up Dask as a distributed service:

In my last post, I showed you tutorial for running Apache Spark on managed kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you the tutorial for running distributed workloads of Dask on AKS.
By using Dask, you can run Scikit-Learn compliant functions and jobs for data which cannot fit in memory, or run in distributed manners. For simplicity, here I’ll use built-in Dask ML function (dask_ml.linear_model.LinearRegression) in this tutorial. (With the same manners, you can also run regular sklearn functions.)
Cloud managed kubernetes will make you speed up this large ML workloads.

Click through for the process. I’ve had some positive experiences with Dask as a dashboarding tool. It’s definitely one of the better ones if you’re big into Python.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Month: May 2021