Press "Enter" to skip to content

Curated SQL Posts

Fetching U-SQL Job Input And Output Paths

Matthew Hicks shows how to retrieve information on U-SQL input and output paths using Powershell:

Each time you submit a U-SQL job, a job folder is created in your Azure Data Lake Store account. This folder contains useful debugging information about the job, including a file called the U-SQL algebra file. This is an XML file containing information about your job graph, the list of input and output files, and other key U-SQL job metadata.

We’ve just published a sample script that reads the U-SQL algebra file for a specified job and returns the input or output files. Give it a try!

Read on for more.

Comments closed

Restricting Login Usage

Kenneth Fisher shows how to prevent people from using those high-power application accounts:

Anyone of these would cause you to fail a security audit. All of them together? Not good.

So how do we fix it? Well, the best possible method is to not give your developers the password. Use config files containing an encrypted copy of the password and you can dramatically limit knowledge of the password. However, that isn’t necessarily a quick or easy solution (modifying the app to use a config file at all for example). So what to do in the meantime?

The simplest thing to do is to create a logon trigger to control where this account can come from. Before we start if you are going to use a logon trigger make sure you know how to log in and disable it if there are any mistakes.

The logon trigger is hardly perfect, but it does help at the margin.

Comments closed

The TREATAS Function In DAX

Matt Allington explains what the TREATAS function does:

The TREATAS function can be used to detect filters from your visual (filter context) and then apply these filters to a disconnected table in your data model.

  • It takes a source table (first parameter) and applies the values from that table to columns in a target table (second and subsequent parameters).

  • You can use a function like VALUES as the first parameter to detect the initial filter context in a visual and hence TREATAS can propagate filter context to the target table.

  • You do not need to have a physical relationship between the source table and the target table.  It therefore means that TREATAS can be used as a virtual many to many relationship.

  • You can pass multiple filters (columns) from the source table to the target table.  TREATAS can therefore can be used to apply multiple relationships (ie on more than one column) between tables.

Read on for a good example of how this works.

Comments closed

Training Convolutional Neural Networks On Satellite Image Data

Ahmet Taspinar builds a neural net which detects roads in satellite images:

Next we will determine the contents of each tile image, using data from the NWB Wegvakken (version September 2017). This is a file containing all of the roads of the Netherlands, which gets updated frequently. It is possible to download it in the form of a shapefile from this location.
Shapefiles contain shapes with geospatial data and are normally opened with GIS software like ArcGIS or QGIS. It is also possible to open it within Python, by using the pyshp library.

This is a pretty lengthy and interesting tutorial.  H/T Data Science Central

Comments closed

Hadoop 3.0 Is Coming

Alex Woodie reports that Hadoop 3.0 will likely drop before Christmas:

After years of work, the Apache Hadoop community is now putting the finishing touches on a release candidate for Hadoop 3.0 and, barring any unforeseen occurrences, will deliver it by the middle of December, according to Vinod Kumar Vavilapalli, a committer on the Apache Hadoop project and director of engineering at Hortonworks.

“We can’t set the dates in stone, but it’s looking like we’ll get something out by mid-December,” Vavilapalli told Datanami in an interview last week.

Read on for some of the bigger changes that come with this.

Comments closed

Impala Now A Top-Level Project

Greg Rahn announces that Apache Impala is now a top-level project:

Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options), and do so in an interactive and iterative manner, which was previously not possible.  Its flexibility and leading analytic database performance drove the strong adoption of Impala across a wide range of global enterprises looking to power these BI and SQL analytic workloads, and led to a constantly growing ecosystem of third-party tools integrating with Impala.

Fast forward three years, Cloudera donated Impala to the Apache Software Foundation, along with the newly announced Apache Kudu project, further solidifying its place in the open source SQL world.  Since the proposal, the Impala engineering team has worked hard to bring Impala to the new software governance model of the Apache Incubator and build an active and innovative community. That’s why we are pleased to announce that Impala has graduated to a Top-Level Apache Software Foundation Project.

Congratulations go out to Cloudera and everyone who has worked on Imapala over the years.

Comments closed

Using Diskspd To Test Storage Performance

Aamir Syed gives an example of Diskspd parameters to test a storage subsystem:

It’s important to test your storage performance especially prior to installing or deploying a new SQL Server.

Microsoft has provided us with a great tool called Diskspd, which was meant to replace SQLIO. Diskspd synthetically generates workloads to run against your server.  It’s pretty robust and has a lot of parameters so that you can customize your test.

Ex. In the command below, I specified -b8k, which means the block size is going to run at 8k, which is the size that SQL uses for pages.

Click through for a sample run and explanation of each parameter.

Comments closed

SQL Server’s Referential Integrity Operator

Joe Obbish explains the purpose of the referential integrity operator in SQL Server 2016:

What would happen if a parent table was referenced by hundreds of child tables, such as for a date dimension table? Deleting or updating a row in the parent table would create a query plan with at least one join per incoming foreign key reference. Creating a query plan for that statement is equivalent to creating a query plan for a query containing hundreds or even thousands of joins. That query plan could take a long time to compile or could even time out. For example, I created a simple query with 2500 joins and it still hadn’t finished compiling after 15 minutes. That’s why I assume a table is limited to 253 incoming foreign key references in SQL Server 2014.

That restriction won’t be hit often but could be pretty inconvenient to work around. The referential integrity operator introduced with compatibility level 130 raises the limit from 253 to 10000. All of the joins are collapsed into a single operator which can reduce compile time and avoid errors.

There’s some really good information in this post, and Joe has mixed feelings on the concept.

Comments closed

SQL Server 2017 CU2 Compatibility Mode Bug

Tracy Boggiano has found a bug in SQL Server 2017:

This will be a very short blog post to make you aware of a bug in CU2 for all of those who I know have eagerly installed the newest CU for 2017.  A small bug I have found is that it changes your compatibility mode on the msdb database to 130.  All our servers were set to 140 and our nice server policy check alerts fired off and sent me 58 pages the day after I installed it in my development environment.  Well, I double checked before installing on QA today and sure enough, it changed it from 140 to 130.  So have your code ready to change it back after you install.

Click through for a script to fix the compatibility level.

Comments closed