1. Kafka in a Nutshell. Published on September 25, 2015, by Kevin Sookocheff. Kevin’s article is all about Kafka in a nutshell. He says “Kafka is quickly becoming the backbone of many organization’s data pipelines — and with good reason. By using Kafka as a message bus we achieve a high level of parallelism and decoupling between data producers and data consumers, making our architecture more flexible and adaptable to change.” If you have not read about Kafka yet, you must go through it. This is more like an executive summary of the what, where, and why of Kafka.
Read on for several more articles, as well as a few case studies and two books.
Why Should We Use Naive Bayes? Is It the Best Classifier Out There?
Probably not, no. In fact, it’s typically a mediocre classifier—it’s the one you strive to beat with your fancy algorithm. So why even care about this one?
Because it’s fast, easy to understand, and it works reasonably well. In other words, this is the classifier you start with to figure out if it’s worth investing your time on a problem. If you need to hit 90% category accuracy and Naive Bayes is giving you 70%, you’re probably in good shape; if it’s giving you 20% accuracy, you might need to take another look at whether you have a viable solution given your data.
Click through to learn what day it is based on what some fictional fellow has as head covering. Also, learn what it is I actually mean when I let “update your priors” slip.
One tool that I’ve recently come across is kube-shell, an integrated shell for working with Kubernetes. What’s great about it is that it’s cross-platform and has intellisense for kubectl.
Installation is a cinch! The prerequisites are python and pip, which can be downloaded from here.
That auto-complete is quite useful.
This actually happened to me in a previous job. We had a fairly complex ETL solution for the most critical part of our Data Warehouse. Many SSIS packages, views, and stored procedures queried the tables that were replicas of the source system tables. One day, we found out that the source system would be deploying a new version of their database the following day. In every single table, some columns were removed, others added, and many changed data types.
There was no way that we could manually update all our SSIS packages, views, and stored procedures in less than a day. Thousands of users depended on our solution. It was too late to pause the source system changes.
That story ends up with a happy ending.
PowerShell DSC is a platform to support the concept of Infrastructure as Code (IaC). It uses declarative syntax instead of the usual imperative syntax of PowerShell. This means that you describe your desired state rather than the specific steps needed to get there. There are two modes for DSC, push and pull, although pull mode offers more features and scalability, we’ll look at writing our configuration and using push mode for this blog post to keep it simple.
This post covers initial installation and some of the initial configuration, so check it out if you’re new to DSC.
I’ve long wanted to do this to help dbatools users easily create a non-production environment to test commands and safely explore our toolset. I finally made it a priority because I needed to ensure some Availability Group commands I was creating worked on Docker, too, and having some clean images permanently available was required. Also, in general, Docker is a just a good thing to know for both automation and career opportunities
Probably a little bit better to work on cmdlets you don’t know about in a sandboxed container rather than on production. Just a little bit.
I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts
Let’s take a look at an example. This is a simplified example without a where clause..but let’s say you have to delete several million rows from a table with many more millions of rows and the table is replicated… in that case you want to batch the deletes so that your log file doesn’t fill up, replication has a chance to catch up and in general the deletes should run faster
Click through for a couple of issues you might run into other than the obvious one of “I’m scanning the entire table with every delete” if you don’t have indexing set up right.
My answer to that is simple, I’m a major contributor to the awesome Powershell library dbatools. What I’ve contributed to that library are commands that will help automate the running and processing of queries from the DMV library of Glenn Berry
At some point in the life of a DBA we’ve all come accross his scripts. For the longest time I would advise people to google “Glenn Berry DMV”, and it will be the top result.
The scripts however, come in a single file per SQL Server version and you can’t run them all in one go. You would have to select a script, run it, and paste the result from Management Studio into an Excel sheet. Glenn provides an empty sheet with tabs ready to paste the various result sets in. I’ve automated this part, hope you like it!
Click through for a demonstration of this cmdlet and the useful output it generates.