Press "Enter" to skip to content

Month: July 2016

Power BI June Update

Dustin Ryan talks about this month’s batch of Power BI updates:

Searchable slicers are also a new feature in the latest release of Power BI Desktop. A couple days ago I wrote about some of my favorite custom visuals, which included the Smart Filter by SQLBI. I think I still prefer the Smart Filter in many situations, but the search-ability of the native Slicer is definitely a nice feature to have right out of the box.

The headline is row-level security, but there are several interesting features here.

Comments closed

Stretch Databases

Tim Radney looks at Stretch Database functionality in RTM:

Prior to learning about this new billing method for DSU, I could make the argument that using Stretch Database would be a very cost effective method for storing cold data (unused data) into the cloud. By stretching this data into Azure, you could migrate a large portion of older data, which would decrease the size (and thus cost) of your local backups. In the event you had to restore a database, you would simply have to establish the connection to Azure for the stretched data, thus eliminating the need to restore it. However, with the minimal cost being nearly $1,000 per month for the low end DSU scale, many organizations will find that it is much cheaper to retain the data on a less expensive tier of storage within their data center and find other methods for HA such as mirroring, log shipping, or Availability Groups.

Read the whole thing.  Maybe V2 of stretch databases will fix some of the biggest problems (the cost, needing to pull all of your data back down before you make any schema changes, etc.) and become a viable feature, but I can’t see it being one today.

Comments closed

HDInsight Tool For Eclipse

Xiaoyong Zhu reports that the HDInight tool for Eclipse is now generally available:

The HDInsight Tool for Eclipse extends Eclipse to allow you to create and develop HDInsight Spark applications and easily submit Spark jobs to Microsoft Azure HDInsight Spark clusters using the Eclipse development environment.  It integrates seamlessly with Azure, enabling you to easily navigate HDInsight Spark clusters and to view associated Azure storage accounts. To further boost productivity, the HDInsight tool for Eclipse also offers the capability to view Spark job history and display detailed job logs.

Check out the link for videos and additional resources.

Comments closed

Query Store Bug

Brent Ozar reports on a bug in Query Store in Standard and Express editions:

If you’re using SQL Server 2016’s awesome new feature, Query Store, there’s a new bug with automatic cleanup. Books Online explains:

Automatic data cleanup fails on editions other than Enterprise and Developer. Consequently, space used by the Query Store will grow over time until configured limit is reached, if data is not purged manually. If not mitigated, this issue will also fill up disk space allocated for the error logs, as every attempt to execute cleanup will produce a dump file.

Here’s hoping that bug gets fixed quickly.

Comments closed

sp_help And Column Lengths

Jon Morisi looks at an oddity with the way that sp_help shows column length:

I was working with a contractor today who was having difficulty providing me back details on a table definition.  I was specifically interested in a particular column’s data type and size.  (This was related to an ETL process I was working on, and my desire to avoid any implicit conversions).

The reply I got back was, “the column you’re interested in is an nvarchar(100)”.  After continued digging and troubleshooting, I was eventually able to sort out that it was actually an nvarchar(50).

I put together this TEST table to illustrate where the confusion came from.  Can you spot what’s going on?

There’s an interesting explanation which makes me dislike sp_help just a little bit more.

Comments closed

Running Compiled Code In Azure ML

Max Kaznady shows how to use R or Python scripts to call compiled code within Azure ML:

In this post, we focus on sourcing R and Python’s external dependencies, such as R libraries and Python modules, which are not already installed on Azure ML and require code compilation. Commonly the compiled code comes from a variety of other languages such as C, C++ and Fortran. One could also use this approach to wrap their compiled code with R or Python wrappers and run it on Azure ML.

To illustrate the process, we will build two MurmurHash modules from C++ for R and Python using the following two implementations on GitHub, and link them to Azure ML from a zipped folder

Link via David Smith.  I knew it was possible to call compiled C code from Python and R, but didn’t expect to be able to do it within Azure ML, so that’s good to know.

Comments closed

Reading From The Data Lake

Bill Vorhies discusses technologies to analyze and use data in a data lake:

So the takeaway that many DB developers would have you believe is ‘Hadoop Good’, ‘RDBMS Bad’.

But wait.  RDBMS EDW hasn’t gone away and won’t. That’s where we keep our single version of the truth, the business data that record legal transactions with customers, suppliers, and employees.  We also get strong SLAs, strong fault tolerance, and highly curated data based on strong ETL, provenance, and governance.  Those are all things that are missing in our Data Lake.

Anybody who sells you on one technology to solve all problems is shilling snake oil.  Bill’s answer is an Adjunct Data Warehouse, which sits separate from the Enterprise Data Warehouse.  You go to the EDW when you risk getting fired or going to jail if the data’s wrong; you go to the ADW when you need data not in your EDW, or when you need larger-scale analytics in which it’s okay to be 1% off.

Comments closed

K-Means Clustering With Python

David Crook discusses k-means clustering and how to implement it using Python:

K-Means takes in an unlabeled data set and a whole real number, k.  K is the number of centroids, or clusters you wish to find.  If you do not know how many clusters there should be, it is possible to do some pre-processing to find that more automatically, however that is out of the scope of this article.  Once you have a data set and defined the size of k, K-Means begins its iterative process.  It starts by selecting centroids by moving them to the average of the data associated with them.  It then reshuffles all of the data into new groups based on the proximity to each centroid.

This is a big and detailed post, and worth reading in its totality.

Comments closed

Power BI Row-Level Security

Reza Rad takes a crack at row-level security within Power BI desktop:

Row Level security is about applying security on a data row level. For example sales manager of united states, should only see data for United States and not for the Europe. Sales Manager of Europe won’t be able to see sales of Australia or United States. And someone from board of directors can see everything. Row Level Security is a feature that is still in preview mode, and it was available in Power BI service, here I mentioned how to use it in the service. However big limitation that I mentioned in that post was that with every update of the report or data set from Power BI Desktop, or in other words with every publish from Power BI Desktop, the whole row level security will be wiped out. The reason was that Row Level Security wasn’t part of Power BI model. Now in the new version of Power BI Desktop, the security configuration is part of the model, and will be deployed with the model.

This is a great security feature, so I’m happy to see the Power BI team taking it the next step forward and integrating RLS directly into Power BI desktop.

Comments closed

Inserting Into External Tables

Paul Hernandez shows how to insert data into an external table using Polybase:

One of the most interesting use cases of Polybase is the ability to store historical data from relational databases into a Hadoop File System. The storage costs could be reduced while keeping the data accessible and still can be joined with the regular relational tables. So let`s do the first steps for our new archiving solution.

Archival is a very good use case for external table insertion, and if you don’t have a Hadoop cluster, you could insert into Azure blob storage.

Comments closed