This is just a variation on the widely-used M pattern for using functions to iterate over and combine data from multiple data sources; Matt Masson has a good blog describing this pattern here. In this case I’m doing the following:
- Defining a table using #table() with three rows containing three search terms.
- Defining a function that calls the metadata API. It takes one parameter, a search term, and returns a value indicating whether the search was successful or not from the JSON document returned. What the API actually returns isn’t relevant here, though, just the fact that I’m calling it. Note the highlighted lines in the code above that show how I’m constructing the URL passed to Web.Contents() by simply concatenating the base URL with the string passed in via the custom function’s Term parameter.
- Adding a custom column to the table returned by the first step, and calling the function defined in the second step using the search term given in each row.
This query refreshes with no problems in Power BI Desktop. However, when you publish a report that uses this code to PowerBI.com and try to refresh the dataset, you’ll see that refresh fails and returns a rather unhelpful error message:
Data source error Unable to refresh the model (id=1264553) because it references an unsupported data source.
The nature of the problem makes sense, and Chris provides one method of getting around this error.
Using a remote store: This is the traditional model for building applications. Here, when an application needs to process an event, it makes a remote call to a separate SQL or No-SQL database. In this model, write operations are always remote calls, but reads can be performed on a local cache in certain scenarios. There are a large number of applications at LinkedIn that fall into this category.
Another pattern is to use a remote cache (e.g., Couchbase) that is fronting a remote database (e.g., Oracle). If the remote cache is used primarily for reading adjunct data, then applications use an Oracle change capture stream (using Databus) to populate the remote cache.
This is a must-read if you’re looking at implementing a streaming architecture and need to do any kind of data enrichment.
As you can see, I definitely have a lot of free space, but my data are so spread across the file and especially up to it’s border, that there is no way to make file size smaller.
If we zoom at the very tail we can figure out the names of tables at the very end of the file, which prevent file from shrinking:
This looks quite a bit like the old Windows 95 defrag tool. I like it.
For a while I meandered between the two approaches until the ssdt team announced that they had released a nuget package with the DacFx in and I decided that I would move over to that as it meant that I no longer had to check in the dll’s into source control which in itself is a big win. I also decided to fix the extensions thing and so figured out a (slightly hacky) way to get the DacFx dll’s in the nuget package to behave like sqlpackage and allow a sub-directory to be used to load dll’s – I fixed that using this powershell module that wraps a .net dll (https://the.agilesql.club/blogs/Ed-Elliott/DacFxed-Nugetized-DacFx-Power…). Now I have the problem of not having to check in dll’s and still being able to load contributors without having to install into program files sorted BUT I still had the problem of lots of command line args which I was sharing in powershell scripts and passing in some custom bits like server/db names etc.
I’m not very familiar with dacpacs, so this was an interesting read for me.
The general approach behind each of the examples that we’ll cover below is to:
Fit a regression model to predict variable (Y).
Obtain the predicted and residual values associated with each observation on (Y).
Plot the actual and predicted values of (Y) so that they are distinguishable, but connected.
Use the residuals to make an aesthetic adjustment (e.g. red colour when residual in very high) to highlight points which are poorly predicted by the model.
The post is about 10% understanding what residuals are and 90% showing how to visualize them and spot major discrepancies.
iv. Row Level Security
Row Level Security proved to be an effective approach for us to provide users a personalized view of their Dashboard & Reports based on the Organization they belonged to. The org hierarchy data was pulled directly from the Human Resource (HR) system, which allowed the Power BI Model to identify which user belonged to which department. In our sample data set, it looks as below.
Read the whole thing.
I’ve always found fork bombs funny because of their elegant simplicity, so I figured, why not build one in SQL Server?
In order to do it, I needed a way to spawn a self-replicating asynchronous process, so I built:
A stored procedure
That creates an Agent job
That runs the stored procedure
I didn’t think it was possible. I certainly didn’t think it would take a half-dozen lines of code.
RStudio has several ways to import data. One of the easiest ways is to import via URL. This link (https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD) gives us the salaries of all of the government employees for Montgomery County, MD in a CSV format. To import this into RStudio, copy the URL and go to Tools -> Import Dataset -> From Web URL…
R and Python are both good languages to learn for data analysis. I lean just a little bit toward R, but they’re both strong choices in this space.
So let’s get down to brass tacks and actually create an alert. To do this, we need some info first:
The Resource Group we will create the alert in.
An Azure location where the alert will live.
An Azure SQL Database server and database we are creating the alert for.
What metric we will monitor and what is the threshold we will be checking.
(optional) An email to send an alert to.
Mike follows this up with code and shows it’s not scary at all to create these alerts from within Powershell.
Before I started, I was already quite comfortable programming Python and did some R programming in the past. This turned out pretty handy, though not really needed to start off with – because starting with Azure ML, the data flow can be created much like BI specialists are used to in SSIS.
A good place to start for me was the Tutorial competition (Iris Petal Competition). It provides you with a pre-filled workspace with everything in place to train and test your first ML model:
I’d like to see Azure ML get more traction; I’m not optimistic that it will.