Press "Enter" to skip to content

Author: Kevin Feasel

Calculating Future Revenue Streams With Phasing

Matt Allington walks us through a somewhat complicated scenario with income stream projections but inconsistent monthly payout rates:

The first thing I decided to do was work out which was the first month of the policy.  I used LOOKUPVALUE to do this because the Calendar table was not connected to the policy table.  Instead of returning the actual month name, I returned the unique MonthID from my calendar table.  I ALWAYS load a unique MonthID in my calendar tables as they are REALLY useful. In short this column is an integer, starting at 1 and incrementing by 1 for each month in the table.  So starting in Jan, after 3 full years the MonthID will be 36.

Pol First Month = LOOKUPVALUE (
     'Calendar'[Month Index],
     'Calendar'[Date],
     MAX ( Policies[Pol Date] )
)

After writing this measure (and all subsequent measures), I placed the measure in the pivot table shown below.  This pivot table has the policy date from the policy table on rows and the YYMM from the Calendar table on columns.  I did this for a very important reason. I knew this would be the way I needed to see the final results and therefore I needed to make sure all measures returned the expected outcome once this layout was used.  And don’t think I wrote the all the formulas correctly to achieve this outcome the first time.  In fact I made plenty of mistakes.  But the point is I wouldn’t have known they were mistakes if I didn’t place the result in a pivot table like the one below.

It’s a good example of breaking apart a fairly complicated concept and rethinking equations as set operations in DAX.

Comments closed

Creating An Inline Table-Valued Function In SQL Server

Jeanne Combrinck looks at inline table-valued functions in SQL Server:

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

In my experience, the best part about using inline table-valued function is the 3-9x performance improvement you get when removing them and inlining their code.  It’s a great idea but the performance cost is just too high for me.

Comments closed

De-bracketing SSMS Scripts

Andy Mallon hates brackets and likes regular expressions:

There are cases where you need brackets, such as having objects with “illegal” characters in them. These rules apply to database names, column names, and all object names. (I’m going to simply refer to “object names” for simplicity, though I concede that “identifier” might be a more correct term.) If you want to put a space or hyphen in an object name, then you’re going to have to use [brackets] to “quote” the name every time you reference it. Similarly, you can’t start an object name with a number, even though it’s a valid character in any other position of the object name: 8Ball is an illegal object name, but AM2 is perfectly OK. There are a bunch of other scenarios, but I won’t go into them all.

If SMO tried to only use [brackets] where necessary, that would likely be a complicated and error-prone branch of code. It’s safer to always include [brackets], and there’s no time when [brackets] will break your code.

Read on to see how Andy gets rid of those pesky things.

Comments closed

Monitoring Query Store Space usage

Erin Stellato has a process which alerts her when query store space utilization reaches a defined threshold:

For those of you not familiar with the Query Store settings, please check out my post which lists each one, the defaults, and what I would recommend for values and why.  When discussing MAX_STORAGE_SIZE_MB, I mention monitoring via sys.database_query_store_options or Extended Events.  As much as I love Extended Events, there isn’t an event that fires based on a threshold exceeded.  The event related to size is query_store_disk_size_over_limit, and it fires when the space used exceeds the value for MAX_STORAGE_SIZE_MB, which is too late.  I want to take action before the maximum storage size is hit.

Therefore, the best option I’ve found is to create an Agent job which runs on a regular basis (maybe every four or six hours initially) that checks current_storage_size_mb in sys.database_query_store_options and calculates the space used by Query Store as a percentage of the total allocated, and then if that exceeds the threshold you set, send an email.  The code that you can put into an Agent job is below.  Please note you want to make sure the job runs in the context of the user database with Query Store enabled (as sys.database_query_store_options is a database view), and configure the threshold to a value that makes sense to your MAX_STORAGE_SIZE_MB.  In my experience, 80% has been a good starting point, but feel free to adjust as you see fit!

Click through for the script.

Comments closed

Hortonworks Data Platform 3.0 Released

Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform:

Highlighted Apache Hive features include:

  • Workload management for LLAP:  You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments.

  • ACID v2 and ACID on by default:  We are releasing ACID v2. With the performance improvements in both storage format and execution engine we are seeing equal or better performance when comparing to non-ACID tables. Thus we are turning ACID on by default and enable full support for data updates.

  • Hive Warehouse Connector for Spark:  Hive Warehouse Connector allows you to connect Spark application with Hive data warehouses. The connector automatically handles ACID tables. This enables data science workloads to work well with data in Hive.

  • Materialized view navigation:  Materialized view allows you to pre-aggregate and pre-compute tables used in queries. Typically works best on sub-queries or intermediate tables. The cost based optimizer will automatically plan a query if those intermediate results are available, drastically speed up your queries.

  • Information schema:  Hive now exposes the metadata of the database (tables, columns etc.) via Hive SQL interface directly.

  • JDBC storage connector:  You can now map any JDBC databases into Hive’s catalog. This means you can join data across Hive and other databases using Hive query engine

This looks pretty good.  So of course I learn about it two days after I rebuild my demo Hadoop cluster.

Comments closed

Executing ML Services Scripts From Jupyter Notebooks

Kyle Weller has an inception moment with Python and SQL Server Machine Learning Services:

While this example is trivial with the Iris dataset, imagine the additional scale, performance, and security capabilities that you now unlocked. You can use any of the latest open source R/Python packages to build Deep Learning and AI applications on large amounts of data in SQL Server. We also offer leading edge, high-performance algorithms in Microsoft’s RevoScaleR and RevoScalePy APIs. Using these with the latest innovations in the open source world allows you to bring unparalleled selection, performance, and scale to your applications.

Normally I see examples come straight from SQL Server or maybe C#, but it’s a bit fun to see one originate in Python on order to execute Python in SQL Server.

Comments closed

Using ggpairs To Find Correlations Between Variables In R

Akshay Mahale shows how to use the ggpairs function in R to see the correlation between different pairs of variables:

From the above matrix for iris we can deduce the following insights:

  • Correlation between Sepal.Length and Petal.Length is strong and dense.
  • Sepal.Length and Sepal.Width seems to show very little correlation as datapoints are spreaded through out the plot area.
  • Petal.Length and Petal.Width also shows strong correlation.

Note: The insights are made from the interpretation of scatterplots(with no absolute value of the coefficient of correlation calculated). Some more examination will be required to be done once significant variables are obtained for linear regression modeling. (with help of residual plots, the coefficient of determination i.e Multiplied R square we can reach closer to our results)

Click through to read the whole thing.

Comments closed

Changing SQL Agent Job Owners

Jason Brimhall shows us a quick method for changing SQL Agent job owners:

It is not unusual to find a server where some random user created a bunch of jobs to be run by SQL Agent. Sometimes, the user creating the job(s) sets themself as the owner of the job. There are certain cases where this behavior is hard to avoid like when creating a maintenance plan.

And of course, there are times when the user just doesn’t know any better. There is of course, the rare occasion when setting the job owner to be ones self makes the most sense -but that is few and far between in the grand scheme. Usually, you will want a non-expiring account such as a service account or a principal without “logon” permissions to be the owner.

The primary reason being simple – humans have an expiration date for every job they will ever have. When that expiration occurs, you may end up with any number of unwanted side effects. Unwanted side effects is exactly what we try to avoid in our jobs run via SQL Agent.

Click through for the script.  It’s also a good rule of thumb not to have your name attached to too many things; that way, your successor has no way of knowing you’re to blame for whatever awful hack is in place…

Comments closed

Updating Tables With Faked Data

Phil Factor continues his data obfuscation series:

We are taking a slow-but-steady approach. We rewrite our code from the previous blog post that assembles the string; it now uses a view to get its random numbers, and we’ll speed it up slightly by putting a bit more intelligence into the markov table. We then put it in a slow User-defined Scalar function. We  want a scalar function that isn’t schema verified and is not considered to be deterministic. The reason for this is that it has to be executed every row despite having the same parameter.

There are many ways to store the information permanently in a Markov table but we’ll be using Table-valued parameters for our function. I’ll show how they are generated from the original information in AdventureWorks, but they could be so easily fetched from a table of markov entries with each markov set identified by a name. This could be delivered to you by the production DBA so that you wouldn’t need any access to the production server.

Next time, Phil promises to tackle dates.

Comments closed