Microsoft AI School: ML Services

David Smith points out an interesting resource for learning more about Microsoft ML Services:

If you’d like to learn how you use R to develop AI applications, the Microsoft AI School now features a learning path focused on Microsoft R and SQL Server ML Services. This learning path includes eight modules, each comprising detailed tutorials and examples:

Read on for more details.

Training Convolutional Neural Networks On Satellite Image Data

Ahmet Taspinar builds a neural net which detects roads in satellite images:

Next we will determine the contents of each tile image, using data from the NWB Wegvakken (version September 2017). This is a file containing all of the roads of the Netherlands, which gets updated frequently. It is possible to download it in the form of a shapefile from this location.
Shapefiles contain shapes with geospatial data and are normally opened with GIS software like ArcGIS or QGIS. It is also possible to open it within Python, by using the pyshp library.

This is a pretty lengthy and interesting tutorial.  H/T Data Science Central

Hadoop 3.0 Is Coming

Alex Woodie reports that Hadoop 3.0 will likely drop before Christmas:

After years of work, the Apache Hadoop community is now putting the finishing touches on a release candidate for Hadoop 3.0 and, barring any unforeseen occurrences, will deliver it by the middle of December, according to Vinod Kumar Vavilapalli, a committer on the Apache Hadoop project and director of engineering at Hortonworks.

“We can’t set the dates in stone, but it’s looking like we’ll get something out by mid-December,” Vavilapalli told Datanami in an interview last week.

Read on for some of the bigger changes that come with this.

Impala Now A Top-Level Project

Greg Rahn announces that Apache Impala is now a top-level project:

Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options), and do so in an interactive and iterative manner, which was previously not possible.  Its flexibility and leading analytic database performance drove the strong adoption of Impala across a wide range of global enterprises looking to power these BI and SQL analytic workloads, and led to a constantly growing ecosystem of third-party tools integrating with Impala.

Fast forward three years, Cloudera donated Impala to the Apache Software Foundation, along with the newly announced Apache Kudu project, further solidifying its place in the open source SQL world.  Since the proposal, the Impala engineering team has worked hard to bring Impala to the new software governance model of the Apache Incubator and build an active and innovative community. That’s why we are pleased to announce that Impala has graduated to a Top-Level Apache Software Foundation Project.

Congratulations go out to Cloudera and everyone who has worked on Imapala over the years.

Using Diskspd To Test Storage Performance

Aamir Syed gives an example of Diskspd parameters to test a storage subsystem:

It’s important to test your storage performance especially prior to installing or deploying a new SQL Server.

Microsoft has provided us with a great tool called Diskspd, which was meant to replace SQLIO. Diskspd synthetically generates workloads to run against your server.  It’s pretty robust and has a lot of parameters so that you can customize your test.

Ex. In the command below, I specified -b8k, which means the block size is going to run at 8k, which is the size that SQL uses for pages.

Click through for a sample run and explanation of each parameter.

SQL Server’s Referential Integrity Operator

Joe Obbish explains the purpose of the referential integrity operator in SQL Server 2016:

What would happen if a parent table was referenced by hundreds of child tables, such as for a date dimension table? Deleting or updating a row in the parent table would create a query plan with at least one join per incoming foreign key reference. Creating a query plan for that statement is equivalent to creating a query plan for a query containing hundreds or even thousands of joins. That query plan could take a long time to compile or could even time out. For example, I created a simple query with 2500 joins and it still hadn’t finished compiling after 15 minutes. That’s why I assume a table is limited to 253 incoming foreign key references in SQL Server 2014.

That restriction won’t be hit often but could be pretty inconvenient to work around. The referential integrity operator introduced with compatibility level 130 raises the limit from 253 to 10000. All of the joins are collapsed into a single operator which can reduce compile time and avoid errors.

There’s some really good information in this post, and Joe has mixed feelings on the concept.

SQL Server 2017 CU2 Compatibility Mode Bug

Tracy Boggiano has found a bug in SQL Server 2017:

This will be a very short blog post to make you aware of a bug in CU2 for all of those who I know have eagerly installed the newest CU for 2017.  A small bug I have found is that it changes your compatibility mode on the msdb database to 130.  All our servers were set to 140 and our nice server policy check alerts fired off and sent me 58 pages the day after I installed it in my development environment.  Well, I double checked before installing on QA today and sure enough, it changed it from 140 to 130.  So have your code ready to change it back after you install.

Click through for a script to fix the compatibility level.

Thoughts On Reliability

Stuart Moore wants to rename Site Reliability Engineering:

The word “Site” in the IT domain typically refers to either a physical location (data center site) or an application (web site); however, the heart of the definition is sociotechnical, not strictly technology. From an undated (seriously, Google?) interview with Ben Traynor, the founder of the SRE movement: “… we have a bunch of rules of engagement, and principles for how SRE teams interact with their environment — not only the production environment, but also the development teams, the testing teams, the users, and so on.” While the previous paragraph of that interview specifically focuses on the type of work that’s being done by Google’s SRE team, these rules of engagement show that SRE’s should be concerned with the entire value stream of service delivery including not only operations, but development, testing, and ultimately the end user experience.  In, other words. SRE’s are concerned with the reliability of the whole service, not just the technical parts.

And Brent Ozar reviews Database Reliability Engineering:

Jump to page 189, the Data Replication section of Chapter 10. Campbell & Majors explain the differences between:

  • Single-leader replication – like Microsoft SQL Server’s Always On Availability Groups, where only one server can accept writes for a given database
  • No-leader replication – like SQL Server’s peer-to-peer replication, where any node can accept writes
  • Multiple-leader replication – like a complex replication topology where only 2-3 nodes can accept writes, but the rest can accept reads

The single-leader replication discussion covers pages 190-202 and does a phenomenal job of explaining the pros & cons of a system like Availability Groups. Those 12 pages don’t teach you how to design, implement, or troubleshoot an AG. However, when you’ve finished those 12 pages, you’ll have a much better understanding of when you should recommend a solution like that, and what kinds of gotchas you should watch out for.

That’s what a Database Reliability Engineer does. They don’t just know how to work with one database – they also know when certain features should be used, when they shouldn’t, and from a big picture perspective, how they should build automation to avoid weaknesses.

I can also recommend the Database Reliability Engineering book.  I’ve not seen the finished product yet (it’s buried in my reading list) but I do like it as a challenge for DBAs and developers to step up their games.

Bug In Older Versions Of SQL Server 2012 & 2014

Paul Randal explains a bug in versions of SQL Server 2012 and 2014:

There hasn’t been a case of it failing to reserve enough space until SQL Server 2012, when a bug was introduced. That bug was discovered by someone I was working with in 2015 (which shows just how rare the circumstances are), and at the time it was thought that the bug was confined to the log of tempdb filling up, rollback failing, and the server shutting down.

However, just last week I was contacted by someone running SQL Server 2012 SP3 who’d seen similar symptoms but for a user database this time, and the user database went into recovery.

Read on for details and make sure those SQL Servers are patched.

Keeping Report Decks Consistent

Tristan Robinson has tips for keeping your Power BI Enterprise report decks looking consistent and nice:

The next consideration is around the number of objects on a report – keep it simple.  Avoid building a giant monolithic report, the more objects you use, the slower the report will perform on PBI service, iPad’s and even to develop.  This is especially true for tables/matrices which will each need to fire off separate DAX queries to return the data elements. Too many objects also has knock on effects for exporting to PowerPoint as objects will overlap with one another more which may not be as much of a case within PBI service but will affect other apps. You can use the selection pane (in the view tab) so move objects above/below one another which will bring forward/push back the elements.

This is advice tailored toward Power BI in particular, but much of it also applies in general.

Categories

December 2017
MTWTFSS
« Nov  
 123
45678910
11121314151617
18192021222324
25262728293031