Press "Enter" to skip to content

Curated SQL Posts

Default Schemas In SQL Server

Daniel Hutmacher looks at specifying default schemas on a database:

If your user is a database owner, (i.e. is a member of the db_owner group or has CONTROL permissions on the database) the default schema will always be dbo. This is something you can’t change.

So if your legacy application needs quasi-administrative privileges in the database, you can’t make it a database owner, but you can grant those permissions on the schema instead (which is actually a better idea anyway).

What Daniel is doing is akin to the pre-2005 concept of user spaces, where Bob had a schema and Mary had a schema and Jill had a schema and so forth.

Comments closed

Automating Temporal Table Creation

Bill Fellows got an increasingly specific set of requirements about data collection:

This post is another in the continuing theme of “making things consistent.” We were voluntold to help another team get their staging environment set up. Piece of cake, SQL Compare made it trivial to snap the tables over.

Oh, we don’t want these tables in Custom schema, we want them in dbo. No problem, SQL Compare again and change owner mappings and bam, out come all the tables.

Oh, can we get this in near real-time? Say every 15 minutes. … Transaction replication to the rescue!

Oh, we don’t know what data we need yet so could you keep it all, forever? … Temporal tables to the rescue?

Yes, temporal tables is perfect. But don’t put the history table in the same schema as the table, put in this one. And put all of that in its own file group.

Click through for a helpful script, and tune in next time, when the other team has Bill move their furniture around.  Maybe move the couch just a hair to the right…no, a little more, oops, too much…

Comments closed

Window Function Basics

Doug Kline has a new series on window functions.  First, he looks at differences between RANK, DENSE_RANK, and ROW_NUMBER:

— Quick! What’s the difference between RANK, DENSE_RANK, and ROW_NUMBER?

— in short, they are only different when there are ties…

— here’s a table that will help show the difference
— between the ranking functions

— note the [Score] column,
— it will be the basis of the ranking

Then, he starts looking at how to build a window function, starting with the OVER clause:

— here’s a simple SELECT statement from the Products table

SELECT ProductName,
UnitPrice
FROM Products
ORDER BY UnitPrice DESC

— this shows that the highest priced product is Cote de Blaye, productID 38

— but sometimes the *relative* price is more important than the actual price
— in other words, we want to know how products *rank*, based on price

Doug’s entire posts are T-SQL scripts along with embedded videos.

Comments closed

Azure Data Lake Store File Management With httr

Leila Etaati shows how to generate RESTful statements in R using httr:

In this post, I am going to share my experiment in how to do file management in ADLS using R studio,

to do this you need to have below items

1. An Azure subscription

2. Create an Azure Data Lake Store Account

3. Create an Azure Active Directory Application (for the aim of service-to-service authentication).

4. An Authorization Token from Azure Active Directory Application

It’s pretty easy to do, as Leila shows.

Comments closed

Using R In Azure Data Lake Analytics

David Smith links to a tutorial which shows how to use R against Azure Data Lake Analytics:

The Azure Data Lake store is an Apache Hadoop file system compatible with HDFS, hosted and managed in the Azure Cloud. You can store and access the data within directly via the API, by connecting the filesystem directly to Azure HDInsight services, or via HDFS-compatible open-source applications. And for data science applications, you can also access the data directly from R, as this tutorial explains.

To interface with Azure Data Lake, you’ll use U-SQL, a SQL-like language extensible using C#. The R Extensions for U-SQL allow you to reference an R script from a U-SQL statement, and pass data from Data Lake into the R Script. There’s a 500Mb limit for the data passed to R, but the basic idea is that you perform the main data munging tasks in U-SQL, and then pass the prepared data to R for analysis. With this data you can use any function from base R or any R package. (Several common R packages are provided in the environment, or you can upload and install other packages directly, or use the checkpoint package to install everything you need.) The R engine used is R 3.2.2.

Click through for the details.

Comments closed

Linear Regression With Deducer

Sunil Kappal demonstrates how to use Deducer, a GUI for R, to perform a simple linear regression:

Selecting the variables in the Deducer GUI:

  • Outcome variable: Y, or the dependent variable, should be put on this list

  • As numeric: Independent variables that should be treated as covariates should be put in this section. Deducer automatically converts a factor into a numeric variable, so make sure that the order of the factor level is correct

  • As factor: Categorically independent variables (language, ethnicity, etc.).

  • Weights: This option allows the users to apply sampling weights to the regression model.

  • Subset: Helps to define if the analysis needs to be done within a subset of the whole dataset.

Deducer is open source and looks like a pretty decent way of seeing what’s available to you in R.

Comments closed

Testing Azure Data Lake Store Performance

Zhen Zeng and Govind Kamat stress test Azure Data Lake Store:

Now that we know the read and write throughput characteristics of a single Data Node, we would like to see how per-node performance scales when the number of Data Nodes in a cluster is increased.

The tool we use for scale testing is the Tera* suite that comes packaged with Hadoop.  This is a benchmark that combines performance testing of the HDFS and MapReduce layers of a Hadoop cluster.  The suite is comprised of three tools that are typically executed in sequence:

  • TeraGen, that tool that generates the input data.  We use it to test the write performance of HDFS and ADLS.

  • TeraSort, which sorts the input data in a distributed fashion.  This test is CPU bound and we don’t really use it to characterize the I/O performance or HDFS and ADLS, but it is included for completeness.

  • TeraValidate, the test that reads and validates the sorted data from the previous stage.  We use it to test the read performance of HDFS and ADLS.

It’s an interesting look at how well ADLS scales.  In general, my reading of this is fairly positive for Azure Data Lake Store.

Comments closed

New Management Studio Features

Wayne Sheffield looks at two new features in SQL Server Management Studio 17.3:

For this simple test, it worked pretty well, and it should work well for most of the requirements that you have. Time will tell how reliably this new feature does work.

The Import Flat File is available when connecting to SQL Server version 2005 or higher. I haven’t tried this on a lower version, but I don’t see any reason why it wouldn’t work there either. You can read more about this feature in Microsoft’s documentation.

You can definitely break the Import Flat File feature, but I appreciate it being smoother than the SSIS-based wizard of yore.  Wayne also shares his thoughts on the Extended Events Profiler.

Comments closed

Azure Database-Level Firewall Rules And Geo-Replication

Arun Sirpal explains that you don’t need to create database-level firewall rules in Azure on secondary databases when using Active Geo-Replication:

The main purpose of this post today is to discuss this point – If you have an Azure SQL Database involved in Active Geo Replication and opt to use database level firewall rules do you need to create the rules in both the primary and secondary database?

I thought so, but I was wrong. I connect to my primary database and run the following (obfuscated) .

Read on for Arun’s demonstration.

Comments closed