2019-01-16 – Curated SQL

What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)?

Those are big questions, but I think that they’re ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I’d take a long, hard look at what’s going on in ML.

Click through for some quick thoughts and several resources on the topic.

Comments closed

Odd Behavior With Altering Columns

Published 2019-01-16 by Kevin Feasel

Solomon Rutzky points out a few things which you can unintentionally change when running an ALTER TABLE [tbl] ALTER COLUMN [col] command:

If the column is NOT NULL, then not specifying NOT NULL will cause it to become NULLable. The documentation for ALTER TABLE even states:
ANSI_NULL defaults are always on for ALTER COLUMN; if not specified, the column is nullable.
Let’s see for ourselves.

Solomon also has a couple collation-related items, including unexpected silent truncation when working with UTF-8 collations.

Comments closed

Auto ML With SQL Server 2019 Big Data Clusters

Published 2019-01-16 by Kevin Feasel

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem:

H2O provides popular open source software for data science and machine learning on big data, including Apache Spark^TM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, however, the latter follows the conventions of Apache Spark’s MLlib library and allows you to build machine learning pipelines that include MLlib transformers. We will focus on the latter API in this post.
H2OAutoML supports classification and regression. The ML models built and tuned by H2OAutoML include Random Forests, Gradient Boosting Machines, Deep Neural Nets, Generalized Linear Models, and Stacked Ensembles.

The post only has a few lines of code but there are a lot of working parts under the surface.

Comments closed

Deploying SQL Server To Kubernetes The Easy Way

Published 2019-01-16 by Kevin Feasel

Andrew Pruski doesn’t want to mess with a bunch of yaml files:

In previous posts I’ve run through how to deploy sql server to Kubernetes using yaml files. That’s a great way to deploy but is there possibly an easier way?
Enter Helm. A package manager for Kubernetes.
Helm packages are called charts and wouldn’t you know it? There’s a chart for SQL Server!
Helm comes in two parts. Helm itself is the client side tool, and tiller, which is the server side component. Details of what each part does can be found here.

They’re making it too easy now…

Comments closed

Filtered Index Trickiness

Published 2019-01-16 by Kevin Feasel

Greg Low explains some of the tricky bits behind using filtered indexes:

If you think about it, if all we’re ever going to use is one part of the index, i.e. just the unfinalized rows, having an entry in there for every single row is quite wasteful, as although the vast majority of the index will never be used, it still has to be maintained.
So in SQL Server 2008, we got the ability to create a filtered index. Now these were actually added to support sparse columns. But on their own, they’re incredibly useful anyway.

I use these on occasion but less than I want to, and a big part of the reason why is in this post, particularly around parameters.

Comments closed

The Difficulties Of Database Load Testing

Published 2019-01-16 by Kevin Feasel

Brent Ozar shares some of the trouble you might run into when database load testing:

Managers think that to simulate more load, they can just take the production queries and replay them multiple times, simultaneously, from the replay tool. We’ve already talked about how you can’t reliably replay deletes, but even inserts and updates cause a problem.
Say we’re load testing Stack Overflow queries, and our app does this:

UPDATE dbo.Users
SET Reputation = Reputation + 1
WHERE Id = 22656;
If try to simulate more load by running that exact same query from 100 different sessions simultaneously, we’re just going to end up with lock contention on that particular user’s row. We’ll be troubleshooting a blocking problem, not the problem we really have when 100 different users run that same query.

Click through for several issues you can run into and Brent’s advice.

Comments closed

Pobody’s Nerfect: The Andy Mallon Story

Published 2019-01-16 by Kevin Feasel

Andy Mallon shares a great story of a critical business mistake and then overcoming that self-inflicted adversity in a hotel room in Kalamazoo:

I still have vivid memories of that night. I’d ordered pizza so that I could stay back at my hotel room and finish my punch list of things before go-live the next day. It was after 2am, and I was sitting at the kitchen counter of the Residence Inn in Kalamazoo, MI, the pizza box still open next to me as I worked my way through a large pepperoni.
I got to the item on my punch list for “delete all test appointments.” The logic here was pretty simple: All the test appointments were for the same imaginary test patient. Just find all of that person’s appointments, and delete them. I decided I would do this one doctor at a time to make sure I didn’t mess it up too badly.

It’s a harrowing story with a happy ending.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: January 16, 2019

Where Machine Learning And Econometrics Collide

Odd Behavior With Altering Columns

Auto ML With SQL Server 2019 Big Data Clusters

Deploying SQL Server To Kubernetes The Easy Way

Filtered Index Trickiness

The Difficulties Of Database Load Testing

Pobody’s Nerfect: The Andy Mallon Story