Press "Enter" to skip to content

Curated SQL Posts

SQL Server 2016 Editions For BI

John White discusses which version of SQL Server 2016 you should use for business intelligence work:

SQL Server 2016 removes the BI Edition as an option, leaving us with a choice between only Standard and Enterprise. The biggest news in my opinion from a licensing perspective with 2016 is that Tabular Mode will now be supported in Standard Edition. This puts the tabular model within the reach of all organizations, and closes the licensing gap in the BI stack. This is fantastic news.

There are of course limitations with Standard mode. Tabular in Standard Mode is restricted to 16 GB of RAM, which may seem like a lot, but keep in mind that tabular is an in-memory technology. It’s possible to bump into this limit fairly quickly, but it’s a limit that serves the small/medium business space rather well.

It’s surprising (in a good way) that John recommends Standard Edition, at least for small and medium businesses.

Comments closed

Bike Rental Demand Estimation

The Revolution Analytics blog has a Microsoft-driven article on estimating bike rental demand with Microsoft R Server:

In addition to the original features in the raw data, we add number of bikes rented in each of the previous 12 hours as features to provide better predictive power. We create acomputeLagFeatures() helper function to compute the 12 lag features and use it as the transformation function in rxDataStep().

Note that rxDataStep() processes data chunk by chunk and lag feature computation requires data from previous rows. In computLagFeatures(), we use the internal function .rxSet() to save the last n rows of a chunk to a variable lagData. When processing the next chunk, we use another internal function .rxGet() to retrieve the values stored in lagData and compute the lag features.

This is a great article for anybody wanting to dig into analytics, because they show their work.

Comments closed

Distributed Unit Testing

Cloudera shows off their distributed unit testing framework:

This distributed testing infrastructure started out as a Cloudera hackathon project in 2014. Todd Lipcon and I worked on a shared backend for running test tasks on a cluster, with Todd focusing on onboarding the Apache Kudu (incubating) tests, and myself on Apache Hadoop. Our prototype implementation reduced the runtime of the 1,700+ Hadoop unit tests from 8.5 hours to 15 minutes.

Since then, we’ve spent time improving the infrastructure and on-boarding additional projects. Besides Kudu and Hadoop, our distributed testing infrastructure is also being used by our Apache Hive and Apache HBase teams. We can now run all the Hadoop unit tests in less than 10 minutes!

Finally, we’re happy to announce that both our infrastructure and code are public! You can browse the webUI at http://dist-test.cloudera.org and see all the source code (ASLv2 licensed) at the cloudera/dist_test github repository. This infrastructure is already being used at upstream Apache to run the Kudu pre-commit tests.

This is an interesting look at how to scale out unit tests.  It’s a bit of a long read (especially with all the videos) but worth your time.

Comments closed

The Code Behind Power BI Parameters

Chris Webb shows us how to get to the M code used in query parameters:

From this you can see that the value returned by the parameter query is just a single piece of text – it’s the value “Monday” that is set as the Current Value, that’s to say the value returned by the parameter itself. The interesting stuff is all in the metadata record associated with the value. I blogged about metadata here, so you may want to read that post before going any further; it’s pretty clear that the fields in the metadata record correspond to the values set in the UI. All of the fields in the metadata record can be edited in the Advanced Editor if you want.

When the parameter is used in another query it is referenced like any other query value. For example, if you load the DimDate table from the Adventure Works DW sample database and use the parameter above to filter the EnglishDayNameOfWeek column then the code generated in the UI looks like this:

I’m sure that by next month, there will be a half-dozen new things added to this alone, given how fast the Power BI team can push features…

Comments closed

Deny Everything

Kenneth Fisher goes over grant, revoke, and deny for permissions:

This means that MyUser can not run a SELECT statement against any table, view or table valued function in the database.

That probably doesn’t sound like you are applying a permission does it? And that is probably where a lot of the confusion comes in. If, however, we take a look at the system views where the data resides then we can see proof that both commands, GRANT and DENY, add a permission.

Particularly interesting is exactly how the deny permission works—and that “deny” is in fact a “permission” in that you modify a permissions list.

Comments closed

Key Components For A Successful Project

Ginger Grant lists five key components for a successful data analysis project:

Security is an obvious consideration which needs to be addressed up front. Data is a very valuable commodity and only people with appropriate access should be allowed to see it. What steps are going to be employed to ensure that happens? How much administration is going to be required to implement it? These questions need to be answered up front.

I want to extend special thanks to Ginger for putting security as the top item on the list.  Also, this seems like a pretty good set of criteria for most projects, so definitely check it out.

Comments closed

Notes From A Biml User Group

There’s a Biml user group in Amsterdam and Koos van Strien took notes:

  • Historically, Varigence has always given away lots of their work for free, and they’ll continue to do so. There are few (maybe no) companies giving this percentage of their work away for free, without having the barrier set at “if you want to start working really, you need our paid product”)

  • When features are introduced as free, they will stay free forever. Sometimes this means the introduction of features in the free product needs to be postponed to see the complete impact.

  • According to Scott, this is shown in the release of Biml Express: they could’ve easily dropped some features and move it into the paid versions of Biml, but they didn’t. Only added new features.

  • The “free while in beta” announcement on Biml Online is mainly a lawyer thing – you can expect BimlOnline to remain free too.

  • If a good SaaS-model is developed, the tools will all be free. But we’re not there yet…

This sounds like it was a pretty long discussion with Scott Currie and I’m insanely jealous that there’s a Biml user group but it’s nowhere near me…

Comments closed

Waits And Latches

Paul Randal has come out with his comprehensive wait and latch type library:

I present to the community a comprehensive library of all wait types and latch classes that have existed since SQL Server 2005 (yes, it includes 2016 waits and latches).

The idea is that over time, this website will have the following information about all wait types and latch classes:

  • What they mean

  • When they were added

  • How they map into Extended Events (complete for all entries already)

  • Troubleshooting information

  • Example call stacks of where they occur inside SQL Server

  • Email link for feedback and questions

It’s not complete yet, but entries are thorough.

Comments closed

Azure SQL Database Q&A

Julie Koesmarno has a Q&A on Azure SQL Database:

Q: Is there going to be down time when I scale up/down? What’s going to happen to my existing connections?

Extracted from Change the service tier and performance level (pricing tier) of a SQL database:

Note that changing the service tier and/or performance level of a database creates a replica of the original database at the new performance level, and then switches connections over to the replica.No data is lost during this process but during the brief moment when we switch over to the replica, connections to the database are disabled, so some transactions in flight may be rolled back. This window varies, but is on average under 4 seconds, and in more than 99% of cases is less than 30 seconds. Very infrequently, especially if there are large numbers of transactions in flight at the moment connections are disabled, this window may be longer.

The duration of the entire scale-up process depends on both the size and service tier of the database before and after the change. For example, a 250 GB database that is changing to, from, or within a Standard service tier, should complete within 6 hours. For a database of the same size that is changing performance levels within the Premium service tier, it should complete within 3 hours.

Video by Joe Idziorek on Service Tiers and how to scale up and down using Azure Portal is available here.

Read the whole thing.  There are some great questions and answers in this set.

Comments closed

Large Sorts And Hashes

SQL Sasquatch looks at a scenario in which large sorts or hash operations can cause CPU to skew compared to page lookups per second:

The graph above has tempdb footprint (light blue) stacked on top of used query memory (dark blue) against the left vertical axis.  The green period has very limited use of query memory.  During the yellow period, a moderate amount of query memory was used.  During the red period, a large amount of query memory was used and at a number of points operations spilled into tempdb.  As query memory was used more extensively, the CPU:lookups/sec correlation was more disrupted.

Once fully considered, this makes sense: query memory is “stolen” from the database page buffer pool.  References to pages in the page pool are “page lookups”, but each time stolen query memory is poked and prodded… well, that’s not a page lookup.  But it has CPU cost.

Check out the whole thing; this is a thoughtful look at an interesting data oddity.

Comments closed