Press "Enter" to skip to content

Day: January 9, 2019

A Compendium Of Kafka Links

Manas Dash shares some interesting Kafka-related articles, case studies, and books:

Articles
1. Kafka in a Nutshell. Published on September 25, 2015, by Kevin Sookocheff. Kevin’s article is all about Kafka in a nutshell. He says “Kafka is quickly becoming the backbone of many organization’s data pipelines — and with good reason. By using Kafka as a message bus we achieve a high level of parallelism and decoupling between data producers and data consumers, making our architecture more flexible and adaptable to change.” If you have not read about Kafka yet, you must go through it. This is more like an executive summary of the what, where, and why of Kafka.

Read on for several more articles, as well as a few case studies and two books.

Comments closed

The Basics Of Naive Bayes Classifiers

I have the first post in a series up on using the Naive Bayes class of algorithms for classifying inputs:

Why Should We Use Naive Bayes? Is It the Best Classifier Out There?
Probably not, no. In fact, it’s typically a mediocre classifier—it’s the one you strive to beat with your fancy algorithm. So why even care about this one?
Because it’s fast, easy to understand, and it works reasonably well. In other words, this is the classifier you start with to figure out if it’s worth investing your time on a problem. If you need to hit 90% category accuracy and Naive Bayes is giving you 70%, you’re probably in good shape; if it’s giving you 20% accuracy, you might need to take another look at whether you have a viable solution given your data.

Click through to learn what day it is based on what some fictional fellow has as head covering. Also, learn what it is I actually mean when I let “update your priors” slip.

Comments closed

Generating SQL With Biml

Cathrine Wilhelmsen shows us you can do a lot more with Biml than just generating SSIS packages:

This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas of the source system tables. One day, we found out that the source system would be deploying a new version of their database the following day. In every single table, some columns were removed, others added, and many changed data types.
Oh.
There was no way that we could manually update all our SSIS packages, views, and stored procedures in less than a day. Thousands of users depended on our solution. It was too late to pause the source system changes.
Oh.

That story ends up with a happy ending.

Comments closed

Automation With Powershell Desired State Configuration

Jess Pomfret takes us on a journey to a desired state:

PowerShell DSC is a platform to support the concept of Infrastructure as Code (IaC).  It uses declarative syntax instead of the usual imperative syntax of PowerShell.  This means that you describe your desired state rather than the specific steps needed to get there.  There are two modes for DSC, push and pull, although pull mode offers more features and scalability, we’ll look at writing our configuration and using push mode for this blog post to keep it simple.

This post covers initial installation and some of the initial configuration, so check it out if you’re new to DSC.

Comments closed

A Docker-Based Sandbox For dbatools

Chrissy LeMaire takes us through using Docker to build a playground for learning the functionality inside dbatools:

I’ve long wanted to do this to help dbatools users easily create a non-production environment to test commands and safely explore our toolset. I finally made it a priority because I needed to ensure some Availability Group commands I was creating worked on Docker, too, and having some clean images permanently available was required. Also, in general, Docker is a just a good thing to know for both automation and career opportunities

Probably a little bit better to work on cmdlets you don’t know about in a sandboxed container rather than on production. Just a little bit.

Comments closed

Rowcount Shenanigans When Deleting In Batches

Denis Gobo takes us through a few issues you might run into when deleting data in batches:

I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts

Let’s take a look at an example. This is a simplified example without a where clause..but let’s say you have to delete several million rows from a table with many more millions of rows and the table is replicated… in that case you want to batch the deletes so that your log file doesn’t fill up, replication has a chance to catch up and in general the deletes should run faster

Click through for a couple of issues you might run into other than the obvious one of “I’m scanning the entire table with every delete” if you don’t have indexing set up right.

Comments closed

Invoke-DbaDiagnosticQuery In dbatools

Andre Kamman walks through a particularly useful cmdlet in the dbatools package:

My answer to that is simple, I’m a major contributor to the awesome Powershell library dbatools. What I’ve contributed to that library are commands that will help automate the running and processing of queries from the DMV library of Glenn Berry
At some point in the life of a DBA we’ve all come accross his scripts. For the longest time I would advise people to google “Glenn Berry DMV”, and it will be the top result. 
The scripts however, come in a single file per SQL Server version and you can’t run them all in one go. You would have to select a script, run it, and paste the result from Management Studio into an Excel sheet. Glenn provides an empty sheet with tabs ready to paste the various result sets in. I’ve automated this part, hope you like it!

Click through for a demonstration of this cmdlet and the useful output it generates.

Comments closed