Curated SQL – Page 620 – A Fine Slice Of SQL Server

Understanding Logistic Regression

Published 2021-10-01 by Kevin Feasel

Luis Valencia explains the idea of logistic regression:

Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.
However, unlike ordinary linear regression, in it’s most basic form logistic regressions target value is a binary variable instead of a continuous value.

Read on to learn more about logistic regression. The point I like to make about logistic regression is that people brand new to it say it’s regression, because hey, it has “regression” in its name! People who are more familiar with it say that’s a misnomer and it’s really a classification algorithm, not a regression algorithm. But as Luis shows, people who are very familiar with it understand that it is a regression algorithm, which just happens to have nice classification properties because in many cases, elements get pushed to the edges (0 and 1).

Comments closed

Get-FileAgeGroup

Published 2021-10-01 by Kevin Feasel

Jeffrey Hicks builds a useful function:

If there’s one task I’ve never stopped doing, it is finding files. I am constantly creating new ways to organize files and display them in a meaningful format. Naturally, PowerShell is a great tool for this task. Get-ChildItem is obviously the proper starting point. The cmdlet works fine in getting only files from a folder and I can do basic early filtering by name and extension with wildcards. But my latest task was organizing files by date, and because of the way Get-ChildItem works under-the-hood, I’m going to need to resort to late filtering with Where-Object. There’s nothing wrong with that. But if this is a task I’m likely to repeat, then a PowerShell function is on the drawing board. My goal is to create a function that will display files grouped into aging buckets such as 1-week or 6-months. Even though I’m primarily concerned with age based on the last write time, I (or you) might have a need to base aging on the creation time. Let’s code.

Click through for a great walkthrough and code.

Comments closed

Verbose Powershell Results

Published 2021-10-01 by Kevin Feasel

Robert Cain gets verbose on us:

In my previous post, Fun With the PowerShell Switch Parameter, I introduced the use of switches with PowerShell functions. We learned how they work, and how to create your own. For all functions, PowerShell creates a set of switches that are just “built in” to PowerShell. You do not have to explicitly create these yourself, you can simply use them and implement them within your own functions.
Two of the most used are -Verbose and -Debug. When used with the Write-Verbose and Write-Debug cmdlets they will display informational messages to the user of your functions. In this post, we’ll focus on the Verbose switch. The next post in the series will examine the Debug switch.

Click through for examples of how this can be useful.

Comments closed

Checking the Age of Statistics

Published 2021-10-01 by Kevin Feasel

Kevin Hill points out that statistics in SQL Server are not like a fine wine:

I was trying to troubleshoot some TerriBad * tempdb write performance…almost 3000ms per write, on a server that we recently migrated to.
Our data center vendor of course said the storage was perfect and that we should try troubleshooting SQL Server. (Duh?)

Read on for what Kevin found.

1 Comment

Against Reporting Tables

Published 2021-10-01 by Kevin Feasel

Erik Darling doesn’t like reporting tables:

I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.

As usual, Erik says something I want to disagree with, and then I read the post and don’t really disagree with him—or if I do, he’s already laid out the “Yes, I understand X” exception. I’ve used reporting tables to good effect, but the important thing is that they’re general-purpose and designed into the application, not specific to a single user.

Comments closed

Recommendations for SQL Server on VMware

Published 2021-10-01 by Kevin Feasel

Anthony Nocentino has some recommendations for us:

The intent of this post is a quick reference guide based on the recommendation made in “Architecting Microsoft SQL Server on VMware vSphere” April 2019 version. The target audience for this blog post is for SQL Server DBAs introducing them to the most impactful configurations and settings for running SQL Server in VMware.
For the explanations for each of these settings and how to configure the base VMware infrastructure, please read the “Architecting Microsoft SQL Server on VMware vSphere” guide and consult with your VMware administrators and experts.

Click through for Anthony’s summary.

Comments closed

Why 200 Tasks for a Spark Execution?

Published 2021-09-30 by Kevin Feasel

The Hadoop in Real World team explains why you might see 200 tasks when running a Spark job:

It is quite common to see 200 tasks in one of your stages and more specifically at a stage which requires wide transformation. The reason for this is, wide transformations in Spark requires a shuffle. Operations like join, group by etc. are wide transform operations and they trigger a shuffle.

Read on to learn why 200, and whether 200 is the right number for you.

Comments closed

Switching Connections from AAS to Power BI

Published 2021-09-30 by Kevin Feasel

Marc Lelijveld wants to swap a connection from using Azure Analysis Services to Power BI Premium:

Having the context of an Azure Analysis Services dataset that is migrated to Power BI Premium, you might have to rebind many reports. Especially if this dataset is positioned as being a managed dataset that is also used for self-service purposes and has many related reports.
In this blog I will elaborate on how you can easily rebind all these reports to the new Power BI dataset, without downloading all reports and manual rebinding.

It’s not a trivial operation, but it is a lot easier than updating each entry individually.

Comments closed

Power BI 101

Published 2021-09-30 by Kevin Feasel

Soheil Bakhshi is starting some 101-level training on Power BI:

Many people talk about Power BI, its benefits and common challenges, and many more want to learn Power BI, which is excellent indeed. But there are many misconceptions and misunderstandings amongst the people who think they know Power BI. In my opinion, it is a significant risk in using tools without knowing them, and using the technology is no different. The situation is even worse when people who must know the technology well don’t know it, but they think they do. These people are potential risks to the businesses that want to adopt Power BI as their primary analytical solution across the organisation. As a part of my day-to-day job, I communicate with many people interacting with Power BI. Amongst many knowledgeable users are some of those who confuse things pretty frequently, which indicates a lack of understanding of the basic concepts.
So I decided to write a series of Power BI 101 to explain the basics of the technology that we all love in simple language. Regardless of your usage of Power BI, I endeavour to help you know what to expect from Power BI. This is the first part of this series.

Read on for the start of this series, asking the question “What is Power BI?”

Comments closed

Partition Switching of Staging Data

Published 2021-09-30 by Kevin Feasel

Aaron Bertrand shares a technique to make table refreshes easier for end users:

So, what is a staging table in SQL? A staging table can be more easily understood using a real-world example: Let’s say you have a table full of vegetables you’re selling at the local farmer’s market. As your vegetables sell and you bring in new inventory:
– When you bring a load of new vegetables, it’s going to take you 20 minutes to clear off the table and replace the remaining stock with the newer product.
– You don’t want customers to sit there and wait 20 minutes for the switch to happen, since most will get their vegetables elsewhere.
Now, what if you had a second empty table where you load the new vegetables, and while you’re doing that, customers can still buy the older vegetables from the first table? (Let’s pretend it’s not because the older vegetables went bad or are otherwise less desirable.)

Read on for some techniques Aaron used for a long time and why he switched to partition switching.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts