Kevin Feasel – Page 837

By default, SQL Server will grant 20% of available memory to any R or Python scripts running. The purpose of this limit is to prevent you from hurting server performance with expensive external scripts (like, say, training large neural networks on a SQL Server).
Here’s the kicker: this affects you even if you don’t have Resource Governor enabled. If you see out-of-memory exceptions in Python or error messages about memory allocation in R, I’d recommend bumping this max memory percent up above 20, and I have scripts to help you with the job. Of course, making this change assumes that your server isn’t stressed to the breaking point; if it is, you might simply want to offload that work somewhere else.

Click through for the other issue.

Comments closed

The Either Monad in Scala

Published 2020-07-02 by Kevin Feasel

Jyoti Sachdeva explains some of the power of Either:

We use Options in scala but why do we want to go for Either?
Either is a better approach in the respect that if something fails we can track down the reason, which in Option None case is not possible.
We simply pass None but what is the reason we got None instead of Some. We will see how to tackle this scenario using Either.

This is a classic functional programming pattern and one of the easier monads to understand.

Comments closed

A Note on Distributed Network Names

Published 2020-07-02 by Kevin Feasel

Allan Hirt provides an explanation around Distributed Network Names when building Windows Server Failover Clusters on Windows Server 2019:

The new Windows Server 2019 DNN functionality does have a side effect that does affect Azure-based configurations. When creating a WSFC, Windows Server 2019 detects that the VM is running in Azure and will use a DNN for the WSFC name. This is the default behavior.

I clipped this paragraph specifically because Allan uses both “affect” and “effect” correctly, and I wanted to call that out. Do read the rest of it as well.

Comments closed

ADF.Procfwk Version 1.8

Published 2020-07-02 by Kevin Feasel

Paul Andrew has been busy:

Following more great feedback from the Data Platform community the primary goal of this release was to further improve the resilience of the framework processing. These improvements included its restart clean up capabilities and introducing better dependency chain handling between Worker pipelines when failures occur. The latter builds on the existing restart functionality first introduced in release v1.2 and supplements the logic using a new set of pipeline dependency metadata. I’ve created the below visual to conceptually show the new dependency chain behaviour, should you wish to populate and make use of the new metadata handling.

Read on for the full changelog.

Comments closed

Understanding the Identity Cache

Published 2020-07-02 by Kevin Feasel

Jignesh Raiyani explains the purpose of the identity cache and shows how you might disable it if you want to:

In the release of SQL Server 2012, Microsoft has introduced a feature of Identity Cache. The identity jump doesn’t cause any issue neither with the database nor the running tasks; however, this identity value gap is not acceptable in some of the business-oriented scenarios. This Identity counter values of the columns are stored in the system table separately, which is known as sys.identity_columns DMV. This reference object sys.identity_columns does not rely on the transaction status, and it doesn’t matter that the transaction on the user table is committed or rolled back. Now, in case this user transaction is being rolled back, we will find that the identity value has been skipped in the sys.identity_columns table. The reason being is that the value of the Identity column value does not get rolled back. Apart from the above scenario, there could be various reasons or situations where these issues of the skip of the Identity value will be seen, which will be small in number and random. However, when there is an issue of Identity value jump, the amount of the skip value will be prominent and precise in number, depending upon the column data type, which is easy to identify.

My advice is, don’t disable this unless you have a really good reason to. “I don’t want to see numbers skipped” is a bad reason because identity columns never guarantee sequential insertion. You can still have skips due to transaction rollbacks or other errors on insertion, and the identity cache helps with performance problems in cases with lots of writers inserting rows into the table.

Comments closed

Fun with LOB and Locking

Published 2020-07-02 by Kevin Feasel

Erik Darling notes a locking scenario specifically with LOB data:

To backtrack a little bit, part of what I wanted to show was that using order by can sometimes result in “more” blocking. I don’t mean more locks; what I mean is that when we need to order by Score, but we don’t have Score indexed in a useful way, the query will get hung up without showing any rows whatsoever.

Read the whole thing.

Comments closed

Data Lakes for Smaller Projects

Published 2020-07-01 by Kevin Feasel

Thomas Spicer explains that your data lake doesn’t need to be enormous to be useful:

We recently wrote an article debunking common myths about data lake architectures, data lake definitions, and data lake analytics. It is called “What is a Data Lake? Get A Leg Up Avoiding The Biggest Myths.” In that article, we framed the current conversation about data lakes and how they fit within enterprise data strategies. This topic has historically been confusing and opaque for those wanting to get value from a data lake due to conflicting advice from consultants and vendors.
One area that can be particularly confusing is the perception that lakes are only for “big data.” If you spend any time reading materials on lakes, you would think there is only one type and it would look like the Capsian Sea (it’s a lake despite “sea” in the name). People describe data lakes as massive, all-encompassing entities, designed to hold all knowledge. The good news is that lakes are not just for “big data” and you have more opportunities than ever to have them be part of your data stack.

Click through for Thomas’s argument.

Comments closed

The Basics of Autoregressive Models

Published 2020-07-01 by Kevin Feasel

Holger von Jouanne-Diedrich explains some of the principels of autoregressive models through a demonstration:

Well, this seems to be good news for the sales team: rising sales! Yet, how does this model arrive at those numbers? To understand what is going on we will now rebuild the model. Basically, everything is in the name already: auto-regressive, i.e. a (linear) regression on (a delayed copy of) itself (auto from Ancient Greek self)!
So, what we are going to do is create a delayed copy of the time series and run a linear regression on it. We will use the lm() function from base R for that (see also Learning Data Science: Modelling Basics).

Read on for some additional understanding.

Comments closed

Tips for Debugging in Visual Studio

Published 2020-07-01 by Kevin Feasel

Patrick Smacchia has 12 tips for debugging in Visual Studio:

4) Data breakpoint: Break when value changes
If you set a breakpoint to a non-static property setter it will be hit when changing the property value for all objects. The same behavior can be obtained for a single object thanks to the Locals (or Watch) window right click : Break When Value Changes menu.
This facility is illustrated with the animation above. The hit occurs only when obj2.Prop is changed, not when obj1.Prop is changed.

These go a step beyond the basics, so check them out.

Comments closed

PRINTing More Than 8000 Bytes

Published 2020-07-01 by Kevin Feasel

Richard Swinbank hits on a bugbear of mine:

A feature of T-SQL is that strings longer than 8000 bytes are truncated by PRINT. If you haven’t already discovered this, you might wonder why it’s a problem – the answer (for me at least) is dynamic SQL.

Read on for Richard’s answer. There is an easier way to do it with the paid version of SQL# by using the Util_Print function.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

ML Services and Resource Governor

The Either Monad in Scala

A Note on Distributed Network Names

ADF.Procfwk Version 1.8

Understanding the Identity Cache

Fun with LOB and Locking

Data Lakes for Smaller Projects

The Basics of Autoregressive Models

Tips for Debugging in Visual Studio

PRINTing More Than 8000 Bytes