Press "Enter" to skip to content

Author: Kevin Feasel

SSMS Keyboard Shortcuts In Azure Data Studio

Bob Pusateri reduces a bit of the mental burden of shifting to Azure Data Studio:

One thing about Azure Data Studio I’m not too keen about, though, is that many of the keyboard shortcuts are different. One keyboard shortcut that’s particularly helpful to me is using Ctrl + E to execute queries. I realize that F5 is the most common key to execute a query, however on most laptop keyboards you now need to hold an additional key to make the function keys behave like function keys. For this reason, Ctrl+ E is a wonderful and quick alternative, but it doesn’t work in Azure Data Studio. Or didn’t, until now.
Fortunately, Azure Data Studio is designed to be expanded upon with extensions from both Microsoft and the community. In the case of keyboard shortcuts, a particularly helpful one is called SSMS Keymap, which ports many popular SSMS keyboard shortcuts into Azure Data Studio. With this extension,  Ctrl + E is once again an option, and I no longer have to click “Execute” with a mouse, or fumble to find my laptop’s F5 equivalent.

Click through for the demo and grab that extension.

Comments closed

Using IDENTITY In A SELECT Statement

Kenneth Fisher shares something he learned recently about the IDENTITY function:

Now, looking at it a bit more closely you’ll see that this is a function call, not just a property. Now, in my research for this post I did find where I’d mentioned this function briefly in my somewhat comprehensive identity post. Technically I didn’t mention so much as it was mentioned to me in the comments so I added it to the list. I guess I either didn’t look at it closely enough at the time or it’s just one of those cases where I forgot. Either way, it’s worth highlighting now.

Click through to learn more.

Comments closed

Comparing Data With CHECKSUM

David Fowler shows how to use CHECKSUM and CHECKSUM_AGG to compare data:

There are times when we need to compare two tables and figure out if the data matches. I often see a number of ways of doing this suggested, most are quite slow and inefficient. I’d quite like to share a quick and slightly dirty way of doing this using the CHECKSUM and CHECKSUM_AGG functions.

CHECKSUM()
Just a reminder that CHECKSUM() will generate a checksum for an entire row or selection of columns in the row.

CHECKSUM_AGG()
Will generate a checksum for a dataset.

David then has a couple of examples showing these in action.

Comments closed

When Using SSRS Makes Sense

Eugene Meidinger lays out the scenarios in which it makes sense to use SQL Server Reporting Services over Power BI, Excel, or other tooling:

SSRS makes it easy to control who has access to your reports and data. It is possible to specify permissions on the whole server, specific folders of reports or on a single report. Permissions inherit down, like a regular file system, unless you explicitly break inheritance to specify custom permissions.
In addition to permissions, you have a central server to house and control your reports. This is critical when you need an authoritative source of truth for your reporting. Users can trust that they are reading the latest version of any given report.
In addition to the administrative side of things, SSRS provides a powerful development environment with SSDT. SQL Server Data Tools (SSDT) is based on Visual Studio, a very popular Integrated Developer Environment or IDE. SSDT makes it incredibly easy to store your reports in source control since your reporting artefacts are just XML files. Source control makes it possible to collaborate on a team or rollback to earlier versions of a report. This is a capability that is not available with Excel or Power BI reports.  

Read the whole thing.

Comments closed

Keeping Polybase Tables In Sync With Biml

Ben Weissman combines two of my favorite things:

If you have started playing with polybase, you probably figured out by now, that – as awesome as it is – it’s still a bit of a pain to set it up and maintain external tables. There is a wizard in Azure Data Studio but it’s still under development, especially from a usability standpoint.
So what can be done about that? Well, we effectively looking for an easy way to read metadata from a relational database and automate T-SQL to mirror that metadata. HELLO?! Perfect usecase for Biml – which is NOT just for SSIS.
Let’s take a look at how that can be done…

If only Ben could have used F# instead of VB and VB with curly braces…

Comments closed

What’s New With Cloudera Enterprise 6.1.0

Krishna Maheshwari announces Cloudera Enterprise 6.1.0:

Platform Support & Security
Cloudera now supports deploying with OpenJDK 8 in addition to Oracle’s JDK. With this release, we also support AWS CloudHSM for HDFS encryption-at-rest.
As customers are increasingly implementing security, we are changing defaults to be secure in order to reduce setup complexity and configuration misses. As a part of this release, several defaults in Kafka, Impala, Sqoop, and Flume have been changed to be more secure and added BDR replication from insecure to secure (Kerberized) clusters to ease the transition to secure clusters.

There are several improvements and new features here worth checking out.

Comments closed

TPC-DS Testing With HDP 3.0

Nita Dembla and Gopal Vijayaraghavan compare HDP 3.0 versus HDP 2.6.5 when running the TPC-DS query set and note performance improvements in Hive LLAP:

Hortonworks announced the general availability of HDP 3.0 this year. You may read more about it here. Bundled with HDP 3.0, Apache Hive 3 with LLAP took a significant leap as a Enterprise Ready Real time Database Warehouse with transactional capabilities that continues to serve BI workloads with lower latencies. HDP 3.0 comes with exciting new capabilities – ACID support, materialized views, SQL constraints and Query result cache to name a few.  Additionally, we continued to build and improve on the performance enhancements introduced in earlier releases.
In this blog, we will provide an update on our performance benchmark blog, comparing performance of HDP 3.0 to HDP 2.6.5. The noteworthy difference in benchmark is that all tables are by default transactional and written in ACID format, which means there are additional metadata (ROW_ID) columns to uniquely identify each row and support transactional semantics. Another key database capability used and tested here is SQL constraints. The hive-testbench schema has been enhanced to declare Primary-Foreign key, not null and unique constraints.

Their headline is that Hive 3 is up to 2x faster than Hive 2, with huge gains in a few of the queries.

Comments closed

Variable Screening With vtreat

John Mount explains how you can use vtreat for determining variable importance:

Part of the vtreat philosophy is to assume after the vtreat variable processing the next step is a sophisticated supervised machine learningmethod. Under this assumption we assume the machine learning methodology (be it regression, tree methods, random forests, boosting, or neural nets) will handle issues of redundant variables, joint distributions of variables, overall regularization, and joint dimension reduction.
However, an important exception is: variable screening. In practice we have seen wide data-warehouses with hundreds of columns overwhelm and defeat state of the art machine learning algorithms due to over-fitting. We have some synthetic examples of this (here and here).
The upshot is: even in 2018 you can not treat every column you find in a data warehouse as a variable. You must at least perform some basic screening.

Read on to see a couple quick functions which help with this screening.

Comments closed

Using Sqoop’s Logic To Improve Spark JDBC Performance

Avi Yehuda analyzes how Sqoop works to make relational database access from Spark faster:

Sqoop performed so much better almost instantly, all you needed to do is to set the number of mappers according to the size of the data and it was working perfectly.
Since both Spark and Sqoop are based on the Hadoop map-reduce framework, it’s clear that Spark can work at least as good as Sqoop, I only needed to find out how to do it. I decided to look closer at what Sqoop does to see if I can imitate that with Spark.
By turning on the verbose flag of Sqoop, you can get a lot more details. What I found was that Sqoop is splitting the input to the different mappers which makes sense, this is map-reduce after all, Spark does the same thing. But before doing that, Sqoop does something smart that Spark doesn’t do.

Read on to see what in particular Sqoop does, and how you can use that in your Spark code.

Comments closed

Powershell Core Preview 3 Fails To Start

Max Trinidad walks us through an issue with Powershell Core Preview 3:

Just in case you haven’t try to install PowerShell Core Preview.3, in Windows, which became available on the evening of the 10th of December. If are doing a clean installation, meaning that it was previously manually uninstall, or that this is your first installation, then you are fine.
The issue with installing PowerShell Core Preview.3 is when you do an upgrade over a previous version: either Preview.1 or Preview.2.

That’s a common occupational hazard when working with preview tools.  But Max has your solution.

Comments closed