It turns out it’s pretty easy (even if it takes some time). So where to start? Well the first thing we need is a place to put our database. An Azure SQL Database Server. If you don’t already have one creating a new one is fairly easy.
First start at portal.azure.com. Log in and follow these steps
This is the longer, manual process. It’s good to walk through it this way at least once before writing a Powershell script, just to see what the script is doing.
At first I used Gianluca’s solution (“SSMS in High-DPI Displays: How to Stop the Madness“), but it wasn’t perfect – fonts in some places were really blurry, and some dialogs were totally unusable. He has several examples in his post if you’re curious. But I have several too.
This is the previous version of SSMS (13.0.15600.2), out of the box, which now foregoes any type of DPI scaling at all, using the old-fashioned jaggy type we’ve been suffering for decades (except check out the smooth text on the About dialog title bar!).
Cf Gianluca Sartori. There’s still some work to do, but more and more of us are moving to high-resolution and 4K monitors; 1080p isn’t cutting it anymore.
This is just a variation on the widely-used M pattern for using functions to iterate over and combine data from multiple data sources; Matt Masson has a good blog describing this pattern here. In this case I’m doing the following:
- Defining a table using #table() with three rows containing three search terms.
- Defining a function that calls the metadata API. It takes one parameter, a search term, and returns a value indicating whether the search was successful or not from the JSON document returned. What the API actually returns isn’t relevant here, though, just the fact that I’m calling it. Note the highlighted lines in the code above that show how I’m constructing the URL passed to Web.Contents() by simply concatenating the base URL with the string passed in via the custom function’s Term parameter.
- Adding a custom column to the table returned by the first step, and calling the function defined in the second step using the search term given in each row.
This query refreshes with no problems in Power BI Desktop. However, when you publish a report that uses this code to PowerBI.com and try to refresh the dataset, you’ll see that refresh fails and returns a rather unhelpful error message:
Data source error Unable to refresh the model (id=1264553) because it references an unsupported data source.
The nature of the problem makes sense, and Chris provides one method of getting around this error.
Using a remote store: This is the traditional model for building applications. Here, when an application needs to process an event, it makes a remote call to a separate SQL or No-SQL database. In this model, write operations are always remote calls, but reads can be performed on a local cache in certain scenarios. There are a large number of applications at LinkedIn that fall into this category.
Another pattern is to use a remote cache (e.g., Couchbase) that is fronting a remote database (e.g., Oracle). If the remote cache is used primarily for reading adjunct data, then applications use an Oracle change capture stream (using Databus) to populate the remote cache.
This is a must-read if you’re looking at implementing a streaming architecture and need to do any kind of data enrichment.
As you can see, I definitely have a lot of free space, but my data are so spread across the file and especially up to it’s border, that there is no way to make file size smaller.
If we zoom at the very tail we can figure out the names of tables at the very end of the file, which prevent file from shrinking:
This looks quite a bit like the old Windows 95 defrag tool. I like it.
For a while I meandered between the two approaches until the ssdt team announced that they had released a nuget package with the DacFx in and I decided that I would move over to that as it meant that I no longer had to check in the dll’s into source control which in itself is a big win. I also decided to fix the extensions thing and so figured out a (slightly hacky) way to get the DacFx dll’s in the nuget package to behave like sqlpackage and allow a sub-directory to be used to load dll’s – I fixed that using this powershell module that wraps a .net dll (https://the.agilesql.club/blogs/Ed-Elliott/DacFxed-Nugetized-DacFx-Power…). Now I have the problem of not having to check in dll’s and still being able to load contributors without having to install into program files sorted BUT I still had the problem of lots of command line args which I was sharing in powershell scripts and passing in some custom bits like server/db names etc.
I’m not very familiar with dacpacs, so this was an interesting read for me.
The general approach behind each of the examples that we’ll cover below is to:
Fit a regression model to predict variable (Y).
Obtain the predicted and residual values associated with each observation on (Y).
Plot the actual and predicted values of (Y) so that they are distinguishable, but connected.
Use the residuals to make an aesthetic adjustment (e.g. red colour when residual in very high) to highlight points which are poorly predicted by the model.
The post is about 10% understanding what residuals are and 90% showing how to visualize them and spot major discrepancies.
iv. Row Level Security
Row Level Security proved to be an effective approach for us to provide users a personalized view of their Dashboard & Reports based on the Organization they belonged to. The org hierarchy data was pulled directly from the Human Resource (HR) system, which allowed the Power BI Model to identify which user belonged to which department. In our sample data set, it looks as below.
Read the whole thing.
I’ve always found fork bombs funny because of their elegant simplicity, so I figured, why not build one in SQL Server?
In order to do it, I needed a way to spawn a self-replicating asynchronous process, so I built:
A stored procedure
That creates an Agent job
That runs the stored procedure
I didn’t think it was possible. I certainly didn’t think it would take a half-dozen lines of code.
RStudio has several ways to import data. One of the easiest ways is to import via URL. This link (https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD) gives us the salaries of all of the government employees for Montgomery County, MD in a CSV format. To import this into RStudio, copy the URL and go to Tools -> Import Dataset -> From Web URL…
R and Python are both good languages to learn for data analysis. I lean just a little bit toward R, but they’re both strong choices in this space.