2021-05-14 – Curated SQL

AI versus ML versus Deep Learning

Published 2021-05-14 by Kevin Feasel

Holger von Jouanne-Diedrich asks the expert:

This is our 101’st blog post here on Learning Machines and we have prepared something very special for you!
Oftentimes the different concepts of data science, namely artificial intelligence (AI), machine learning (ML), and deep learning (DL) are confused… so we asked the most advanced AI in the world, OpenAI GPT-3, to write a guest post for us to provide some clarification on their definitions and how they are related.
We are most delighted to present this very impressive (and only slightly redacted) essay to you – enjoy!

The machine has learned about itself. This is where I’m glad I only believe weak AI is possible…

Comments closed

Learning the Basics of Kafka via Notebook

Published 2021-05-14 by Kevin Feasel

Francesco Tisiot shares a way to learn about the basics of Apache Kafka using Jupyter notebooks:

One of the best ways to learn a new technology is to try it within an assisted environment that anybody can replicate and get working within few minutes. Notebooks represent an excellence in this field by allowing people to share and use pre-built content which includes written descriptions, media and executable code in a single page.
This blog post aims to teach you the basics of Apache Kafka Producers and Consumers through building an interactive notebook in Python. If you want to browse a full ready-made solution instead, check out our dedicated github repository.

The classic tutorials tend to use a couple command prompts and the built-in producer and consumer shell scripts. I like this approach as a way of being able to review the code and results later as a refresher.

Comments closed

Translating a Result Set into a Comma-Separated List

Published 2021-05-14 by Kevin Feasel

Kiana Bergsma shows us a tried-and-true method to confuse people:

Often times I have told developers, here is how you do it, and if you Google on it you will find some great samples. Now it is time that I provide my own sample. I call this the FOR XML hack since it used the FOR XML command, without actually involving any XML at all.

I’m quite happy that STRING_AGG() is around as of SQL Server 2017, as it is a much clearer representation of how to solve this problem. If I had a dollar for every time somebody needed me to explain why I used FOR XML PATH() when I clearly wasn’t building XML, I’d have several dollars. Probably not a fistful of dollars, though.

Comments closed

Powershell: the Rest is Commentary

Published 2021-05-14 by Kevin Feasel

Kenneth Fisher stands on one foot:

Commenting your code, still super important. That piece of code that looks a bit strange because you couldn’t find another way to make it work? Better put in a note why so the next person doesn’t have to spend hours figuring out what you did and why. That block of code that pulls a list of zip files and unzips them? Explain what you are doing. The next person to look at this (who just may be you) could use a hint as to what you were thinking. Weird variable name? Heck, not so weird variable name. It couldn’t hurt to explain the purpose. Did I ever tell you I got a job because I did such a good job commenting my code during a technical test?

Read the whole thing.

Comments closed

VS_NEEDSNEWMETADATA in SSIS

Published 2021-05-14 by Kevin Feasel

Hadi Fadlallah discusses what was the bane of my existence for about 3 months in 2010:

In this article, we will briefly explain the VS_NEEDSNEWMETADATA SSIS exception, one of the most popular exceptions that an ETL developer may face while using SSIS. Then, we will run an experiment that reproduces this error. Then, we will show how we can fix it.

This was really annoying prior to SQL Server 2008 (at least, that’s my early-morning recollection of when the SSIS engine started trying to auto-fix this) and has been mildly annoying since. I had far too many conversations which I could summarize as “Yes, I understand that this Excel spreadsheet is basically the same, but it’s different in that the casing on one header column has changed slightly and that breaks the entire system.

Comments closed

An Introduction to Power BI Goals

Published 2021-05-14 by Kevin Feasel

Imran Burki brings us an introduction to Power BI Goals:

One of the things I love about Power BI (and Microsoft in general) is that they empower everyone in the organization to utilize their software – Power BI Goals are so easy to set up. There’s absolutely no special skillset required. You just need a Power BI Premium or Premium Per User license. Power BI Goals essentially enable you to keep track of key performance indicators in a single, unified view. Goals, and the actuals, are data driven. Goals can also be hardcoded. The data for actuals and goals must reside in a report that you can access.

Click through for an example as well.

Comments closed

Types of Memory Contention

Published 2021-05-14 by Kevin Feasel

Erik Darling is overdrawn at the memory bank (which was, sadly, not a very good MST3K episode):

Whomever decided to give “memory bank” its moniker was wise beyond their years, or maybe they just made a very apt observation: all memory is on loan.
Even in the context we’ll be talking about, when SQL Server has lock pages in memory enabled, the pages that are locked in memory may not have permanent residency.
If your SQL Server doesn’t have enough memory, or if various workload elements are untuned, you may hit one of these scenarios:

There are three of them, which is really that there are two of them but they can join forces in an effort to make your life a pain.

Comments closed

Finding Index Fragmentation

Published 2021-05-14 by Kevin Feasel

Deepthi Goguri is hunting the most dangerous predator:

The bad page splits are the splits that we learned in the previous post which is the split that occurs when a random insert has to happen and there is no space on page, a new page gets created during the page split. These page splits are very expensive causing the fragmentation. Good page splits occurs when the append only inserts happen as the pages gets filled on the index pages to the right side of the index and new pages gets added as they gets filled up to the right side of the index. These types of good page splits doesn’t cause any index fragmentation. SQL Server will group these two types of page splits together and do not differentiate between them. So, how do we know to differentiate between the good and the bad page splits? Let’s learn more about this.
It is very difficult to differentiate these page splits by using the existing methods we have in the SQL Server like using the perfmon counter which has the pagesplits/sec counter. This counter will give the good and the nasty page splits together. There is a DMV sys.dm_db_index_operational_stats and an extended event page_split event to track the page splits.

Read on to see how we can find those undesirable page splits versus the benign ones.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Day: May 14, 2021