Month: November 2018

It is said to be safe, reliable and proven using complex algorithms and built-in intelligence where it can do the following (see this link for more details: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning)

CREATE INDEX – identifies indexes that may improve performance of your workload, creates indexes, and automatically verifies that performance of queries has improved.

DROP INDEX – identifies redundant and duplicate indexes daily, except for unique indexes, and indexes that were not used for a long time (>90 days). Please note that at this time the option is not compatible with applications using partition switching and index hints.

FORCE LAST GOOD PLAN – identifies SQL queries using execution plan that is slower than the previous good plan, and queries using the last known good plan instead of the regressed plan.

Personally I don’t enable the option where it is allowed a “free-for-all” when creating/dropping indexes and forcing certain query plans. I like controlling the change, especially for production databases. To force this concept I wanted to use Extended Events to know when / if someone changed my settings for automatic tuning against my database.

Click through for the script.

Comments closed

Using Microsoft Flow To Find Power BI Data Sources In Use

Published 2018-11-02 by Kevin Feasel

Chris Webb continues his series on using Microsoft Flow to extend Power BI:

The problem with self-service BI is that you never quite know what your users are up to. For example, what data sources are they using? Are there hundreds of Excel files being used as data sources for reports that you don’t know about? If so, where are they? Could they and should they be replaced by a database or something else more robust? In this post I’ll show you how you can use Microsoft Flow and the Power BI REST API (see part 1 to find out how to create a Flow custom connector to call the Power BI API) to get the details of all the data sources used in all of the workspaces of your Power BI tenant.

I’ll admit that doing this turned out to be a bit trickier than I had expected. My plan was to use the GetDatasetsAsAdmin endpoint to get a list of all datasets, loop over each one and then call the (undocumented, but in the REST API’s Swagger file and therefore in my custom connector) GetDatsourcesAsAdmin endpoint to get the datasources in each dataset. Both these endpoints require administrative permissions to call, so I made sure my custom connector had the correct permissions (at least Tenant.Read.All – you can check this in the Azure Portal on the app you registered in Azure Active Directory) and I ran the Flow as a user with Power BI Admin permissions. But I kept getting 404 errors when requesting the data sources for certain datasets .

Chris explains why those 404s appear and what you can do about them.

Comments closed

Investigating UK Traffic With Principal Component Analysis

Published 2018-11-01 by Kevin Feasel

Michael Grogan shows us how to use Principal Component Analysis (PCA) to classify route segments in UK transportation data:

Specifically, let us assume that we wish to analyze traffic density for buses and coaches. The main thing we are interested in is the frequency of traffic across a particular route.

Let’s take an example. If buses cover 100 miles on a route that is 5 miles long within a certain timeframe, then the frequency will be greater than 100 miles covered on a route that is 10 miles long over the same time period.

Read on for an interesting example.

Comments closed

Checking Functional Dependencies In R Data Frames

Published 2018-11-01 by Kevin Feasel

John Mount shows us how to use the psagg function in wrapr to ensure that functional dependencies are valid:

Notice only grouping columns and columns passed through an aggregating calculation (such as max()) are passed through (the column zis not in the result). Now because y is a function of x no substantial aggregation is going on, we call this situation a “pseudo aggregation” and we have taught this before. This is also why we made the seemingly strange choice of keeping the variable name y (instead of picking a new name such as max_y), we expect the y values coming out to be the same as the one coming in- just with changes of length. Pseudo aggregation (using the projection y[[1]]) was also used in the solutions of the column indexing problem.

Our wrapr package now supplies a special case pseudo-aggregator (or in a mathematical sense: projection): psagg(). It works as follows.

In this post, John calls the act of grouping functional dependencies (where we can determine the value of y based on the value of x, for any number of columns in y or x) pseudo-aggregation.

Comments closed

Provisioning An Azure SQL Managed Instance

Published 2018-11-01 by Kevin Feasel

Frank Gill walks us through the process of provisioning an Azure SQL Managed Instance:

Once you have created the prerequisites, you are ready to create your first Managed Instance. As of now, Managed Instance is only available in the following subscription types:

Pay-As-You-Go

Enterprise Agreement

Cloud Service Provider

Information about subscription and resource limitations can be found here. I will update this with any changes.

Frank has a series of screenshots to show you the way.

Comments closed

Using datapasta To Paste Spreadsheet Data In R

Published 2018-11-01 by Kevin Feasel

Mara Averick shows us how we can use datapasta with RStudio to create good representative examples when asking questions:

So, you’ve been asked to make a reprex and you want to include a bit of data that you have in a spreadsheet. Meet {datapasta}, a package by Miles McBain that can make your life a whole lot easier. Once you’ve installed datapasta, you simply copy a selected number of rows from your spreadsheet (remember, this is a minimal reproducible example), and click the Paste as tribble option from the DATAPASTA section of the Addins dropdown

Click through for a demo.

Comments closed

Using Polybase External Tables To Connect To Oracle

Published 2018-11-01 by Kevin Feasel

Rajendra Gupta continues his Polybase series:

In part 2 of the series, we saw that the external table could be accessed similarly to a relational database table. One more advantage is that we can join them with any relational tables.

Let us see how we can join the external table with the relational DB tables. I have saved the data into a CSV file so we will import the table using my earlier article, SQL Server Data Import using SQL Operations Studio. Therefore, you can follow the article in the same way in the Azure Data Studio also. I will just give high-level steps to import data from flat file into Azure Data Studio in this article.

Click through for more.

Comments closed

Editing ArcGIS Maps In Power BI

Published 2018-11-01 by Kevin Feasel

Jason Bonello shows us the types of changes we can make to ArcGIS maps in Power BI:

Map themes – This allows a change in the style for the map and once can choose from location only, heatmaps or clustering (the last two are only available for point layers, that is when you select Points in the Location Type). Through the clustering option, one could group individual location points into larger circular clusters that fall within a cluster radius – giving a high level view and then the ability to drill down into each region. If heatmaps are chosen any values in the Size or Color will be ignored and the tooltips will not be available.

Read the whole thing.

Comments closed

“String Or Binary Data Would Be Truncated” Update In SQL Server 2017

Published 2018-11-01 by Kevin Feasel

Randolph West shows us how, in SQL Server 2017 CU 12, we can remove the scourge of “String or binary data would be truncated”:

This is how the error message looks now:

1

2

3

Msg 2628, Level 16, State 6, Procedure ProcedureName, Line Linenumber

String or binary data would be truncated in table ‘%.*ls’, column ‘%.*ls’.

Truncated value: ‘%.*ls’.

Notice how the table, column and value are all mentioned in the error message now, which makes debugging and troubleshooting much easier. Thank you Microsoft!

As of 24 October 2018, we can now get the full picture in SQL Server 2017 as well, provided we install Cumulative Update 12. I’d say this is worth the update in and of itself!

There is a trace flag involved, so check it out.

Comments closed

More Tabular Best Practices

Published 2018-11-01 by Kevin Feasel

Ginger Grant has a few more best practices for working with Analysis Services tabular models:

Modify Timestamps to Split Date and Time

When there is a field where the date and time are both needed, the values should be separated so that there is both a date field and a time field. Having date time in two fields assists in the dictionary encoding as the date and time fields can be separately sorted into columns where the values are the same, decreasing the number of dictionary entries. To further improve compression, only include the seconds if absolutely necessary, as add decreasing the cardinality will increase compression.

Click through for more tips.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30