Investigating UK Traffic With Principal Component Analysis

Michael Grogan shows us how to use Principal Component Analysis (PCA) to classify route segments in UK transportation data:

Specifically, let us assume that we wish to analyze traffic density for buses and coaches. The main thing we are interested in is the frequency of traffic across a particular route.

Let’s take an example. If buses cover 100 miles on a route that is 5 miles long within a certain timeframe, then the frequency will be greater than 100 miles covered on a route that is 10 miles long over the same time period.

Read on for an interesting example.

Checking Functional Dependencies In R Data Frames

John Mount shows us how to use the psagg function in wrapr to ensure that functional dependencies are valid:

Notice only grouping columns and columns passed through an aggregating calculation (such as max()) are passed through (the column zis not in the result). Now because y is a function of x no substantial aggregation is going on, we call this situation a “pseudo aggregation” and we have taught this before. This is also why we made the seemingly strange choice of keeping the variable name y (instead of picking a new name such as max_y), we expect the y values coming out to be the same as the one coming in- just with changes of length. Pseudo aggregation (using the projection y[[1]]) was also used in the solutions of the column indexing problem.

Our wrapr package now supplies a special case pseudo-aggregator (or in a mathematical sense: projection): psagg(). It works as follows.

In this post, John calls the act of grouping functional dependencies (where we can determine the value of y based on the value of x, for any number of columns in y or x) pseudo-aggregation.

Provisioning An Azure SQL Managed Instance

Frank Gill walks us through the process of provisioning an Azure SQL Managed Instance:

Once you have created the prerequisites, you are ready to create your first Managed Instance.  As of now, Managed Instance is only available in the following subscription types:

  • Pay-As-You-Go
  • Enterprise Agreement
  • Cloud Service Provider

Information about subscription and resource limitations can be found here.  I will update this with any changes.

Frank has a series of screenshots to show you the way.

Using datapasta To Paste Spreadsheet Data In R

Mara Averick shows us how we can use datapasta with RStudio to create good representative examples when asking questions:

So, you’ve been asked to make a reprex and you want to include a bit of data that you have in a spreadsheet. Meet {datapasta}, a package by Miles McBain that can make your life a whole lot easier. Once you’ve installed datapasta, you simply copy a selected number of rows from your spreadsheet (remember, this is a minimal reproducible example), and click the Paste as tribble option from the DATAPASTA section of the Addins dropdown

Click through for a demo.

Using Polybase External Tables To Connect To Oracle

Rajendra Gupta continues his Polybase series:

In part 2 of the series, we saw that the external table could be accessed similarly to a relational database table. One more advantage is that we can join them with any relational tables.

Let us see how we can join the external table with the relational DB tables. I have saved the data into a CSV file so we will import the table using my earlier article, SQL Server Data Import using SQL Operations Studio. Therefore, you can follow the article in the same way in the Azure Data Studio also. I will just give high-level steps to import data from flat file into Azure Data Studio in this article.

Click through for more.

Editing ArcGIS Maps In Power BI

Jason Bonello shows us the types of changes we can make to ArcGIS maps in Power BI:

Map themes – This allows a change in the style for the map and once can choose from location only, heatmaps or clustering (the last two are only available for point layers, that is when you select Points in the Location Type). Through the clustering option, one could group individual location points into larger circular clusters that fall within a cluster radius – giving a high level view and then the ability to drill down into each region. If heatmaps are chosen any values in the Size or Color will be ignored and the tooltips will not be available.

Read the whole thing.

“String Or Binary Data Would Be Truncated” Update In SQL Server 2017

Randolph West shows us how, in SQL Server 2017 CU 12, we can remove the scourge of “String or binary data would be truncated”:

This is how the error message looks now:

Notice how the tablecolumn and value are all mentioned in the error message now, which makes debugging and troubleshooting much easier. Thank you Microsoft!

As of 24 October 2018, we can now get the full picture in SQL Server 2017 as well, provided we install Cumulative Update 12. I’d say this is worth the update in and of itself!

There is a trace flag involved, so check it out.

More Tabular Best Practices

Ginger Grant has a few more best practices for working with Analysis Services tabular models:

Modify Timestamps to Split Date and Time

When there is a field where the date and time are both needed, the values should be separated so that there is both a date field and a time field.   Having date time in two fields assists in the dictionary encoding as the date and time fields can be separately sorted into columns where the values are the same, decreasing the number of dictionary entries.  To further improve compression, only include the seconds if absolutely necessary, as add decreasing the cardinality will increase compression.

Click through for more tips.


November 2018
« Oct