Curated SQL – Page 1033 – A Fine Slice Of SQL Server

Customizing Your Rprofile

Published 2020-01-22 by Kevin Feasel

Colin Gillespie shows how you can customize R via the .Rprofile file:

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!
Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming. However, roughly R will look for a file called .Rprofile first in your current working directory, then in your home area. Crucially, it will only load the first file found. This means you can have per project Rprofile.

Click through for a sample R profile which has a lot going on.

Comments closed

Simulating Feller’s Coin-Tossing Puzzle in R

Published 2020-01-22 by Kevin Feasel

David Robinson has another fun puzzle:

Mathematician William Feller posed the following problem:
If you flip a coin times, what is the probability there are no streaks of heads in a row?
Note that while the number of heads in a sequence is governed by the binomial distribution, the presence of consecutive heads is a bit more complicated, because the presence of a streak at various points in the sequence isn’t independent

Click through for a solution in R.

Comments closed

Against Citizen Data Scientists

Published 2020-01-22 by Kevin Feasel

Bill Schmarzo doesn’t like the idea of “citizen data scientists” very much:

“Hello,” he says. “My name is Dr. Payne and I am your Citizen Dentist for today.”
Citizen Dentist?! You repeat the question out loud for him to hear, want an answer to this looney statement. “What is a Citizen Dentist?”
Get this. He replies, “I’m a person who performs dental work, but my proficiency and expertise is outside of the field of dentistry.”

Bill’s alternative is “Citizens of Data Science.” Click through to see what that means and how it differs.

Comments closed

Strongly Type Table-Valued Parameters

Published 2020-01-22 by Kevin Feasel

Jonathan Kehayias shows the benefits of using the MaxLength parameter when calling a table-valued parameter from .NET code:

We can see that the MaxLength for the string columns is set at -1, meaning they are being passed over TDS to SQL Server as LOBs (Large Objects) or essentially as MAX datatyped columns, and this can impact performance in a negative manner. If we change the .NET DataTable definition to be strongly-typed to the schema definition of the user-defined table type as follows and look at the MaxLength of the same column using a debug break:

This can be important, especially if you make a lot of calls or use fairly large TVP sizes.

Comments closed

Calculating Compound Interest in DAX

Published 2020-01-22 by Kevin Feasel

Marco Russo and Alberto Ferrari want you to watch your money grow:

Coincidentally, both debt instrument examples are what is known as “bullet” loans, where the entire principal amount ($100) is repaid in one lump sum at maturity (at the end of Year 5). In the first example the interest income payments are deferred until maturity, thereby allowing the interest to compound over the holding period. In the second example, the interest income payments are made at the end of each year, which means that the amount of debt accruing interest each year is always the same ($100).
Now let us consider a slightly more complex investment with compounding interest where the interest rate differs year-to-year. Because the interest rate varies, you can’t use the simple formula above (or its FV function equivalent in Excel). Rather, you must effectively stack each year on top of the preceding year and calculate year-by-year.

And that’s something you can do with DAX.

Comments closed

T-SQL Tuesday 122 Round-Up

Published 2020-01-22 by Kevin Feasel

Jon Shaulis has a wrap-up for T-SQL Tuesday #122:

Overall, we had 27 individuals post or share their Imposter Syndrome stories and thoughts. I had a lot of great reading to do this week and weekend.

Click through for the links.

Comments closed

Contrasting TVPs and Memory-Optimized TVPs

Published 2020-01-22 by Kevin Feasel

Denis Gobo wants to see what memory-optimized table-valued parameters are good for:

The other day I was thinking about the blog post Faster temp table and table variable by using memory optimization I read a while back. Since you can’t believe anything on the internets (no disrespect to whoever wrote that post) , I decided to take this for a test
In this post I will be creating 2 databases, one is a plain vanilla database and the other, a database that also has a file group that contains memory optimized data
I will also be creating a table type in each database, a plain one and a memory optimized one in the memory optimized database

Read on for Denis’s findings.

Comments closed

Deploy a Big Data Cluster to a Single-Node kubeadm Cluster

Published 2020-01-22 by Kevin Feasel

Mohammad Darab shows how to build out a single-node Big Data Cluster on-premises:

This blog post will walk you through deploying a SQL Server Big Data Cluster on a single node Kubernetes cluster. You can install a Big Data Cluster on a physical machine or a virtual machine. Whatever option you choose must have the below minimum requirements:
– 8 cpu
– 64 GB RAM
– 100 GB disk space

Read on for instructions, or check out Mohammad’s video on the topic.

Comments closed

New Database-Scoped Configurations in SQL Server 2019

Published 2020-01-22 by Kevin Feasel

Niko Neugebauer looks at database-scoped configuration settings in SQL Server 2019:

Looking at the picture on the left you can see the Database Scoped Configurations available in Sql Server 2019 that I took at the end of December 2019. Mainly we can see a difference of 2 new items in the configurations since the end of the last year – the VERBOSE_TRUNCATION_WARNINGS & LAST_QUERY_PLAN_STATS.

Niko explains what these are and also takes a look at offerings in Azure SQL Database.

Comments closed

Streams and Tables in Apache Kafka

Published 2020-01-21 by Kevin Feasel

Michael Noll wraps up a series on Apache Kafka. First up is the fundamentals of Kafka Streams:

A table is a, well, table in the ordinary technical sense of the word, and we have already talked a bit about tables before (a table in Kafka is today more like an RDBMS materialized view than an RDBMS table, because it relies on a change being made elsewhere rather than being directly updatable itself). Seen through the lens of event streaming however, a table is also an aggregated stream. This is a reference to the stream-table duality we discussed in part 1.

In the conclusion, Michael covers a few advanced topics:

Streams and tables are always fault tolerant because their data is stored reliably and durably in Kafka. This should be relatively easy to understand for streams by now as they map to Kafka topics in a straightforward manner. If something breaks while processing a stream, then we just need to re-read the underlying topic again.
For tables, it is more complex because they must maintain additional information—their state—to allow for stateful processing such as joins and aggregations like COUNT() or SUM(). To achieve this while also ensuring high processing performance, tables (through their state stores) are materialized on local disk within a Kafka Streams application instance or a ksqlDB server. But machines and containers can be lost, along with any locally stored data. How can we make tables fault tolerant, too?

This was a nice series.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Curated SQL Posts