Kevin Feasel – Page 1101

In this first part I’m going to set up the scenario and show you what I found out from my own experimentation. The really mind-blowing tricks shown to me by the ever-helpful Curt Hagenlocher of the Power Query dev team will be covered in part 2.
Let’s say you have a large csv file which contains a numeric column and you want to get the maximum value from that column. In this case I’m going to use the 2018 Price Paid data from the UK Land Registry available here. This csv file contains 1021215 rows, one for each property transaction in England and Wales in 2018; the second column in this file contains the the price paid for the property, so the aim here is to get the maximum price paid for all property transactions in 2018.

Read the whole thing.

Comments closed

KSQL UDFs

Published 2019-08-08 by Kevin Feasel

Mitch Seymour takes us through user-defined functions in Kafka’s flavor of SQL:

One of KSQL’s most powerful features is allowing users to build their own KSQL functions for processing real-time streams of data. These functions can be invoked on individual messages (user-defined functions or UDFs) or used to perform aggregations on groups of messages (user-defined aggregate functions or UDAFs).
The previous blog post How to Build a UDF and/or UDAF in KSQL 5.0 discussed some key steps for building and deploying a custom KSQL UDF/UDAF. Now with Confluent Platform 5.3.0, creating custom KSQL functions is even easier when you leverage Maven, a tool for building and managing dependencies in Java projects.

Read on to see just how easy it is.

Comments closed

Text Analysis from Google Sheets

Published 2019-08-08 by Kevin Feasel

Federico Pascual shows how you can use MonkeyLearn to perform text analysis (including sentiment analysis and categorization) from a Google Sheets spreadsheet:

Carrying out a customer survey, for example, can be useful to obtain crucial insights into the overall customer experience of your clients. But the data obtained from these surveys can be incredibly difficult to process, even after you’ve added all the results to a spreadsheet and especially if you receive a high volume of responses.
How do you process this information, then? Should you read the answers one by one? What if you want to know what people are saying about your brand on social media?

Click through for a demo.

Comments closed

Running Powershell Against your Azure SQL DBs

Published 2019-08-08 by Kevin Feasel

Joey D’Antoni has a script which finds all of your Azure SQL Database instances and runs a Powershell script against each one in turn:

This code is pretty simple and looks for an Azure SQL Server in each resource group, and then looks for the databases that aren’t master on each server. In this example I’m setting the storage account for Azure Threat Detection, but you could do anything you wanted in that last loop.

Click through for the script. Things like parallelism and error handling are exercises for the reader.

Comments closed

Exposing Multiple Docker Ports

Published 2019-08-08 by Kevin Feasel

Steve Jones shows how to expose multiple ports when spinning up a container:

I was working with containers recently with Jenkins. I didn’t want the server process running on my machine all the time, but I did need to allow some communication. Jenkins uses 8080 by default, but agents need another port.
I figured there was a way to do this, and I found it on Stack Overflow, which is the perfect forum for a question like this. The answer?

You’ll need to click through for the answer.

Comments closed

Index Column Order and Selectivity

Published 2019-08-08 by Kevin Feasel

Erik Darling gives us multiverse indexing:

Missing index request column order is pretty basic.
Instead, we’re gonna add these:

CREATE INDEX ix_spaces ON dbo.Posts(ParentId, Score);

CREATE INDEX ix_tabs ON dbo.Posts(Score, ParentId);

SQL Server is capable of using both intelligently. This is something I generally don’t like to do, but if you have queries which absolutely need differently-ordered index columns, it can make sense to do this. Just don’t expect SQL Server’s missing index DMV to tell you which order they should be in.

Comments closed

Maintaining SSISDB

Published 2019-08-08 by Kevin Feasel

John McCormack was in a jam:

I made 2 unsuccessful attempts at running the SSIS Server Maintenance Job. However, after several hours of processing and still no available free space in the database, I knew the job wasn’t coping with the sheer number of rows it had to delete. The deletes all happen from the parent table (internal.operations) and then all child tables using using cascading deletes. This approach maintains referential integrity but is not great for performance.
Due to this, I needed a new approach to the maintenance of SSISDB. As we hadn’t maintained these tables for 13/14 months, I was asking too much of SQL Server to let me delete everything at once.

Read on for the solution.

Comments closed

Supported Distributions for SQL Server on Linux

Published 2019-08-08 by Kevin Feasel

Kevin Chant looks at each of the supported distributions for installing SQL Server on Linux:

Another key point is that even though SQL Server on Linux is supported on various distributions SQL Server can be installed on other distributions instead.
Although I would only advise this for testing purposes only because SQL Server would not be supported by Microsoft on these other distributions.
In reality, you can probably install SQL Server on more diverse distributions than the ones listed. However, I have focused on the main ones below.

As Kevin points out, there’s a difference between “working” and “supported.” If you’re futzing about with a dev database or trying to learn the platform, go ahead and install it on Elementary or some other unsupported distro. But if you have a production issue, the fact that you installed SQL Server on an ancient version of Slackware may win you plaudits but won’t get you support.

Comments closed

Snapshot Isolation

Published 2019-08-08 by Kevin Feasel

Gerald Britton takes us through snapshot isolation in SQL Server:

Snapshot isolation avoids most locking and blocking by using row versioning. When data is modified, the committed versions of affected rows are copied to tempdb and given version numbers. This operation is called copy on write and is used for all inserts, updates and deletes using this technique. When another session reads the same data, the committed version of the data as of the time the reading transaction began is returned.
By avoiding most locking, this approach can greatly increase concurrency at a lower cost than transactional isolation. Of course, “There ain’t no such thing as a free lunch!” and snapshot isolation has a hidden cost: increased usage of tempdb.

Gerald covers both varieties, Read Committed Snapshot Isolation and proper Snapshot Isolation. RCSI is definitely worth understanding in almost any environment, and even Snapshot Isolation has its uses.

Comments closed

Options with stats::density() in R

Published 2019-08-07 by Kevin Feasel

Evgeni Chasnovski takes us through what the parameters in the stats::density() R function do:

Argument bw is responsible for computing bandwith of kernel density estimation: one of the main parameters that greatly affect the output. It can be specified as either algorithm of computation or directly as number. Because actual bandwidth is computed as adjust*bw(adjust is another density() argument, which is explored in the next section), here we will see how different algorithms compute bandwidths, and the effect of changing numeric value of bandwidth will be shown in section about adjust.
There are 5 available algorithms: “nrd0”, “nrd”, “ucv”, “bcv”, “SJ”.

Evgeni has also created animations for each of these, so it’s easy to see what they do compared to the default output.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Author: Kevin Feasel

Getting Max Column Value from Power Query

KSQL UDFs

Text Analysis from Google Sheets

Running Powershell Against your Azure SQL DBs

Exposing Multiple Docker Ports

Index Column Order and Selectivity

Maintaining SSISDB

Supported Distributions for SQL Server on Linux

Snapshot Isolation

Options with stats::density() in R