2021-05-21 – Curated SQL

It is a pretty common use case to find the list of duplicate elements or rows in a Spark DataFrame and it is very easy to do with a groupBy() and a count()

Where “easy” has as a modifier just how many columns you’re dealing with in the DataFrame.

Comments closed

Comparing Fluentd to Logstash

Published 2021-05-21 by Kevin Feasel

Ajit Chelat compares two popular log agents:

Log collectors, or aggregators, are critical aspects of the log management infrastructure. They help collect logs from various systems and parse and groom them for ingestion into a monitoring or observability tool for further visualization and analysis. DevOps and SRE teams are quickly adding log collectors to their toolchain. With millions of users across domains, two log collectors have risen to the forefront of log collection—Fluentd and Logstash.
This article compares the two and sees which one is the best for your log management and analysis initiatives—Fluentd vs. Logstash.

Click through for the round-by-round comparison and see which one comes out on top in your scenario.

Comments closed

Ignoring Backups in the SQL Server Error Log

Published 2021-05-21 by Kevin Feasel

Garry Bargsley has a solution to an annoyance:

Whether you are new to SQL Server or a seasoned veteran, you will notice odd behavior in the SQL Server Error Log. When a database backup is performed, an entry is put into the SQL Error Log. The SQL Server team decided to log successful backup messages to the Error Log. If you ask most technology professionals, you will find that logging successful events are not really a common occurrence. This behavior causes a bloated Error Log that can make it hard to find what you need quickly.
Luckily, that same SQL Server team built in a solution to this situation.

Read on to see what the solution is, as well as how to use it.

Comments closed

Executing GitHub Actions via CLI

Published 2021-05-21 by Kevin Feasel

Kevin Chant uses the GitHub CLI:

In this post I want to share some advice about using GitHub CLI with GitHub Actions for Data Platform deployments. Because I showed that at SQLDay last week.
For those who were not aware, there is a GitHub CLI you can use from the command line. You can download GitHub CLI from here.
Anyway, GitHub CLI was recently updated to support commands for GitHub Actions. GitHub Actions is the CI/CD mechanism that is now available in GitHub. Which I have covered in a few other posts, including the one you can find by clicking here.

Click through to learn more.

Comments closed

Temporal Table Performance Scenarios

Published 2021-05-21 by Kevin Feasel

Hugo Kornelis continues a series on temporal table performance:

Welcome to part eighteen of the plansplaining series. Like the previous posts, this one too focuses on temporal tables and their effect on the execution plan. After looking at data modifications in temporal tables and at querying with a most basic temporal form of temporal query, let’s look at the more advanced variations for temporal querying.
We’re still looking at getting data from a single query only in this post. We’ll look at joins in the next post.

Click through for these scenarios.

Comments closed

sqltop — SQL Server Process Viewer

Published 2021-05-21 by Kevin Feasel

Mark Wilkinson has a big announcement:

Hey folks! I’m proud to announce the first open source release of my sqltop tool! sqltop is an interactive command-line based tool to view active sessions on a SQL Server instance. In this post I’ll talk about why I wrote the tool, why I chose to write it in PowerShell, and walk through some of the challenges I faced during development.

I’ve had a chance to see this in action and it’s really cool. I’m glad Mark was able to get this open-sourced, so go check it out.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Day: May 21, 2021

Finding Duplicates in a Spark DataFrame

Comparing Fluentd to Logstash

Ignoring Backups in the SQL Server Error Log

Executing GitHub Actions via CLI

Temporal Table Performance Scenarios

sqltop — SQL Server Process Viewer