Press "Enter" to skip to content

Author: Kevin Feasel

Azure Functions Basics

Vincent-Philippe Lauzon explains the basics of Azure Functions:

In general, serverless refers to an economical model where we pay for compute resources used as opposed to “servers”.

Wait…  isn’t that what the Cloud is about?

Well, yes, on a macro-scale it is, but serverless brings it to a micro-scale.

In the cloud we can provision a VM, for example, run it for 3 hours and pay for 3 hours.  But we can’t pay for 5 seconds of compute on a VM because it won’t have time to boot.

A lot of compute services have a “server-full” model.  In Azure, for instance, a Web App comes in number of instances.  Each instance has a VM associated to it.  We do not manage that VM but we pay for its compute regardless of the number of requests it processes.

In a serverless model, we pay for micro-transactions.

This is the first part in a series and is aimed at giving a conceptual explanation.

Comments closed

Conditional Formatting With Power BI Line Charts

Daniil Maslyuk shows how to perform conditional formatting on a line chart in Power BI:

Have you ever wished you could change the line colour depending on the overall trend? For example, if your sales increase over time, the line is green; if there is a decline, then the line is red. While this functionality is not yet natively available in Power BI Desktop, it does not mean this cannot be done! In this article, I am going to show you how to achieve this effect.

Read on to see how he does it.

Comments closed

Source Control In SQL Operations Studio

Drew Furgiuele reminds me why I prefer Mercurial over Git:

Of course, this just scratches the surface of using source control and Git. Now that you have a working example, I encourage you to read more especially about branching and merging, and your inevitable merge conflicts. I think branching is incredibly important if you like to tweak your scripts because it gives you a separate copy of your code to work on and test, and when you’re satisfied you merge your changes back to your master branch (or trunk).

Once you get the basics down, it’s really easy to start contributing other projects too! You could fork a project, work on it, and then submit a pull request to the owners to add features. Or, maybe someone will find code you made available and do the same. It’s amazing. Or a very minimum, this will get you up to speed on how you can introduce an SCM practice for SQL-related scripts at your workplace, if you don’t have one, or how you can start using it if you do (and don’t already).

Click through to see how Drew integrates Git with SQL Operations Studio.  Spoilers:  it’s pretty easy, given the relationship between SqlOps and Visual Studio Code.

Comments closed

Against Simply Closing Connections With Rollback

Kenneth Fisher shows how to close all database connections and immediately roll them back, but then explains why you shouldn’t default to that:

But, let’s take a minute and think about this a bit closer. Should we? Should we dump everyone out of the database? Are you sure?

  • Well, it’s dev and we’ve talked to the development team. No one is doing anything right now.
  • It’s prod so we’ve been extra careful. We arranged an outage with the business and confirmed that now is a good time.

Well, in theory, we’ve been told no one is using the database. However, that’s theory, this is reality, and in fact, someone is in the database. But what are they doing? Is it just a leftover connection? Someone left a query window open and forgot about it? The application creates a connection and holds on to it 24×7 even if it’s not doing anything?

Click through for more thoughts on the topic.

Comments closed

Getting Wait Stats From The Query Store

Erin Stellato shows how to retrieve wait stats per query from Query Store:

In SQL Server 2016 a new DMV is exposed, sys.dm_exec_session_wait_stats, which provides information about waits for an existing, active session. If you know the session_id, you can track waits for a query when it starts and when it completes (snapshot the information at the beginning and end of the query and then diff the information). The challenge is that you have to know the session_id for the query, and you have to set up data capture in advance – which isn’t trivial when you’re in the midst of a high priority issue.

Wait statistics information exists in an actual execution plan starting in SQL Server 2016 SP1. Only the top 10 waits are captured, and there are limitations in terms of what this data represents. For example, CXPACKET is ignored and not included in the output, but it will be included in 2016 SP2 and 2017 CU3 and up – where irrelevant parallelism waits are instead captured by CXCONSUMER (which will not be included in actual plan waits).

So how can we see what a specific query is truly waiting on? We can use Query Store! SQL Server 2017 includes the capture of wait statistics information in Query Store, and this functionality is also available in Azure SQL Database. Wait statistics are tied to a query plan, and are captured over time, just like the runtime statistics. The addition of wait statistics information in Query Store was the number one feature request after its initial release, and all that information together creates powerful troubleshooting capabilities.

Read on for a demo-filled post.  It’s hard to overstate just how useful this information is for query tuning.

Comments closed

Design Concepts: Cognitive Load

Meagan Longoria explains the concept of cognitive load and how it relates to building reports:

Our first design concept is cognitive load, which comes from cognitive psychology and instructional design. Cognitive Load Theory says that when we present our audience with information, we are asking them to use brain power to process it. That brain power (aka working memory) is limited, so we need to be intentional about the information we present to them.

In order to commit information to memory and be able to recall it later, that information must go through sensory memory and working memory and then be encoded into long-term memory.

This concept drives a lot of good advice in dashboard and report design, particularly that if it does not directly help a person learn the important information you are trying to convey, it’s not worth having on the report or dashboard.

Comments closed

Running PySpark In Visual Studio Code

Jenny Jiang shows how to run PySpark on HDInsight in VSCode:

We are excited to introduce the integration of HDInsight PySpark into Visual Studio Code (VSCode), which allows developers to easily edit Python scripts and submit PySpark statements to HDInsight clusters. For PySpark developers who value productivity of Python language, VSCode HDInsight Tools offer you a quick Python editor with simple getting started experiences, and enable you to submit PySpark statements to HDInsight clusters with interactive responses. This interactivity brings the best properties of Python and Spark to developers and empowers you to gain faster insights.

Click through to see how it’s done.

Comments closed

A/B Testing With R

Mira Celine Klein shows how to compare two versions of a feature (or advertising campaign or whatever) to determine if one is better than the other:

In comparison to other methods, conducting an A/B test does not require extensive statistical knowledge. Nevertheless, some caveats have to be taken into account.

When making a statistical decision, there are two possible errors (see also table 1): A Type I error means that we observe a significant result although there is no real difference between our groups. A Type II error means that we do not observe a significant result although there is in fact a difference. The Type I error can be controlled and set to a fixed number in advance, e.g., at 5%, often denoted as α or the significance level. The Type II error in contrast cannot be controlled directly. It decreases with the sample size and the magnitude of the actual effect. When, for example, one of the designs performs way better than the other one, it’s more likely that the difference is actually detected by the test in comparison to a situation where there is only a small difference with respect to the target metric. Therefore, the required sample size can be computed in advance, given α and the minimum effect size you want to be able to detect (statistical power analysis). Knowing the average traffic on the website you can get a rough idea of the time you have to wait for the test to complete. Setting the rule for the end of the test in advance is often called “fixed-horizon testing”.

Click through for more, including a sample with code.  H/T R-Bloggers

Comments closed

R Internals: Data Sizes With Nullable Columns

Niels Berglund digs into the Binary Exchange Langage (BXL) and notices something weird about data sizes:

When looking at the data sent, the size of the packages and “drilling” into the TCP packets we could deduct that: :

  • Each column has an over-head of 32 bytes (at least for non nullable data)

  • The size of the column in one row is the size of the data type for numeric types.

  • For decimal and numeric an extra byte is added to each column, where this byte indicates the precision.

  • Columns of alpha numeric type all had 2 bytes pre-pended to the bytes, except max types.

  • For char and nchar the storage size was 2 bytes plus the size the column was defined as.

  • For varchar and nvarchar the storage size was 2 bytes plus the size of the data stored.

  • For the varmax data types the number of bytes that were pre-pended varied dependent on the data size.

Read the whole thing.

Comments closed

TOP And UNIONs

Andy Levy points out that the TOP operator doesn’t always apply to each element in a UNION:

This system uses TOP every now and then trying to limit the number of records it gets back (and the developers always seem to use the arbitrary 301 – I’m guessing some degree of cargo cult programming going on). I’d prefer a well-constructed WHERE clause to limit the result set but beggars can’t be choosers when working with legacy vendor code.

What I found odd was that the Ledger1 table didn’t get a lot of traffic – with the WHEREclause in use (omitted here for brevity), you’d only get a handful of records, maybe a dozen at most.

Click through to see the rest of the problem, as well as Andy’s solution.

Comments closed