Press "Enter" to skip to content

Author: Kevin Feasel

Instant Log Initialization In Azure

Dimitri Furman shows a benefit of creating database files with Azure Blob Storage:

Recently, we were working on a performance testing exercise using a SQL Server database with files in Azure Blob Storage. After creating the database using the default 8 MB size for data and log file (as in the example above), we wanted to increase the size of all files to be sufficient for the expected workload. IFI was not yet enabled for the SQL Server instance we were working with, and growing the data file from 8 MB to 1 TB took about one minute (using Premium Storage). This was expected, since the data file had to be fully initialized. We expected that the log growth to 1 TB would take about as much time, for the same reason. It was very surprising then that the same operation on the log file completed in less than one second.

It turns out that this is due to differences in Azure Blob Storage versus traditional storage systems.

Comments closed

Starting Out With Powershell

Rob Sewell has his own introduction to Powershell:

PowerShell is another language, you are not going to be as proficient in a new language as you are in the language you spend all day working with. You will have to go through the learning curve and you will have to understand how to interpret errors. There is a learning curve just like with any language. Undoubtedly you will get frustrated at times. Reach out for help. Use twitter, use the #powershellhelp in the SQL Server Community Slack channel. There are many other places and plenty of people who will be glad to help you.

Rob has a pretty detailed introduction to the topic, so it’s well worth the read for new Powershell users.

Comments closed

Power BI Drillthrough

Reid Havens shows how to use the new drillthrough functionality in Power BI:

Report Drill Through enables users to create a report page, filtered to a single entity (E.g. Customer, Employee, Store, Product). Reports often have a summary landing page where there might be a lot of information, but not much detail about a specific item. Traditionally in Power BI we’d create a separate reporting page for further detail breakouts on an entity. However, that would require it’s own set of SLICERS that you would have to re-select if you wanted it to mirror the reporting page you were coming from…that’s too many clicks!

Selecting an item in Drill Through in a table from one reporting page will take you to another page, FILTERED to the entity you selected!This feature essentially let’s us create detail sub-pages that are linked to whatever primary reporting page the report uses. Features like this have been available for YEARS in Excel using linked cells, I’m super happy we finally have this as a feature in Power BI Desktop. It only takes a few steps to setup, but I’ll leave the instructions for that over at the Power BI September release page. With that said, let me run you quickly through how this looks like in a sample report.

Click through for an example, showing how useful drillthrough can be.

Comments closed

Getting Started With Powershell

Rob Farley gives us a basic introduction to Powershell:

I want to take a few minutes to get you started with PowerShell, in a way that I think is less overwhelming than how I found most PowerShell material back in the day. Because you’ve told me you keep hearing about PowerShell, and that you can’t tell how to get into it.

The first thing to know is that it feels like Command Prompt on steroids, but a lot more useful than Command Prompt. You’ll see what I mean as soon as you open it. Let’s do that, but let’s open it using ‘Run as Administrator’, because that just helps for some of the demos. In real life, you’ll only open it with Admin rights if you really need to.

So click on the start button and start typing PowerShell. I don’t want you to open the ISE (yet), just find the “Windows PowerShell” shortcut, and right-click on it to choose ‘Run as Administrator’.

This T-SQL Tuesday is all about Powershell, and Rob’s is one of the first posts you should read if you’re not familiar.

Comments closed

The Use And Misuse Of P Values

John Mount and Nina Zumel explain what p-values are and how people routinely misuse them:

The many things I happen to have issues with in common mis-use of p-values include:

  1. p-hacking. This includes censored data bias, repeated measurement bias, and even outright fraud.

  2. “Statsmanship” (the deliberate use of statistical terminology for obscurity, not for clarity). For example: saying p instead of saying what you are testing such as “significance of a null hypothesis”.

  3. Logical fallacies. This is the (false) claim that p being low implies that the probability that your model is good is high. At best a low-p eliminates a null hypothesis (or even a family of them). But saying such disproof “proves something” is just saying “the butler did it” because you find the cook innocent (a simple case of a fallacy of an excluded middle).

  4. Confusion of population and individual statistics. This is the use of deviation of sample means (which typically decreases as sample size goes up) when deviation of individual differences (which typically does not decrease as sample size goes up) is what is appropriate . This is one of the biggest scams in data science and marketing science: showing that you are good at predicting aggregate (say, the mean number of traffic deaths in the next week in a large city) and claiming this means your model is good at predicting per-individual risk. Some of this comes from the usual statistical word games: saying “standard error” (instead of “standard error of the mean or population”) and “standard deviation” (“instead of standard deviation of individual cases”); with some luck somebody won’t remember which is which and be too afraid to ask.

Even if you know what p-values are, this is definitely worth reading, as it’s so easy to misuse p-values (even when I’m not on my Bayesian post hurling tomatoes at frequentists).

Comments closed

Using Multiple Cosmos DB APIs

Vincent-Philippe Lauzon shows how to access graph data stored in Cosmos DB using the DocumentDB API:

Now here’s a little secret:  although we choose the “model” (e.g. Gremlin) at the Cosmos DB account level, we can use other models to query the data.

Not all combination are possible, but many are.  Specifically, we can query a Gremlin graph using DocumentDB / SQL query language.

The graph is then projected into documents.

We will explore that in this article.

Why is that interesting?  Because there are a lot of tools out there we might be familiar with to manipulate DocumentDB (or MongoDB).  Having to possibility to look at a Graph with other APIs extends our toolset from Gremlin-based ones.

That is interesting.

Comments closed

Fiscal Year Columns In A Power BI Date Dimension

Reza Rad takes a date dimension in Power BI and adds fiscal year details:

As you can see in the image above; June 2017 considered as fiscal year 2017. However, July 2017 is part of fiscal year 2018. So the simple logic can be like this:

if (calendar month >= fiscal year start)

then fiscal year = calendar year

else fiscal year = calendar year + 1

This code is pseudo code. don’t write that exactly in M! Let’s now implement it in M;

If you have to deal with multiple fiscal years (e.g., state and federal government fiscal years), the process is the same, only repeated.

Comments closed

Automatically E-Mailing R Reports

Tim Ali shows how to use SendGrid and Displayr to generate and e-mail out automated reports in R:

To automatically write and send email reports we need to have three tools:

  • A programming language. If the analysis is non-trivial, R is usually the best way forward.

  • An app that can automatically run the analyses at specified times. I’ve used Displayr. Of course, if you have the time you can avoid this commercial product and set up your own servers. (Disclaimer, I work for Displayr.) You can read this post to see how to get your data into Displayr and sign up to Displayr here if you do not already have an account.

  • An app to actually send the emails. I’ve chosen an email delivery application called SendGrid because R can work with its API.

To be honest, I’d probably use a different mechanism for deploying reports, but if you just need to put something together with a bit of chewing gum and bailing wire, the tools are there.  H/T R-Bloggers

Comments closed

DAX Is Still Important

Reid Havens explains why he teaches DAX in his Power BI courses:

The Check Formula button is an easily overlooked feature. However, before I hit ok and save my DAX Measure I ALWAYS press this button first! But what exactly does this button do? Well it’s checking your DAX syntax and making sure everything is written correctly. Now you COULD simply hit OK after writing your DAX and see if it errors, this is true. However when doing that your data model is actually attempting to calculate the DAX measure in the background as well. Not a big deal with a few thousand rows, but if you’re working with a model that has millions of rows then that could take a long time for it to calculate, and then error!

The smart thing to do is to check your DAX syntax using the Check Formula button BEFORE hitting ok. Checking your DAX syntax doesn’t run your calculation and returns a rewarding No errors in formula output if everything was written correctly. Such a simple thing that can save you SO MUCH TIME! I highly recommend as a best practice to always use this before hitting ok and saving your measures, you’ll thank me later.

It makes for interesting reading.

Comments closed