T-SQL Differences in Azure SQL Database
I used to think this was the real difference between SQL Server and SQL Database. I was wrong. Really wrong. But it’s a good place to start. Now from what I can tell everything in Azure is a moving target. There are constant changes so it’s important to know where the documentation is. In this particular case here it is: Azure SQL Database Transact-SQL differences.
Check it out. The differences are smaller than in the past, but I expect that there will always be some differences—particularly on the administration side—due to the nature of Azure SQL Database as a PaaS offering.
Allows you to visualize a data point on each date on the calendar.
- The darker the color, the higher the value or density of values.
If you have multiple rows on the same date they are aggregated together
The Calendar Visualization can be used for cross filtering. Meaning you can select a square in the calendar and it will filter other visuals down to the date you picked.
This is an interesting visual. It’s dense, but not difficult to understand.
If your data source is a SQL database, you might have implemented pagination by using
LIMIT .. OFFSET, or
OFFSET .. FETCHor some
ROWNUM / ROW_NUMBER()filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.
Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”).
He also has a good explanation of the seek method.
I will throw in one jab at Oracle (because hey, it’s been a while since I’ve lobbed a bomb at Oracle on this blog): it’d really suck to have a system where I legally wasn’t allowed to distribute relevant performance comparison benchmarks. Fortunately, I tend to work on better data stacks.
The third part of the invitation was to write this. I covered what I did, and some of what I learned. I’ll add a bit more here.
I certainly was clumsy working with XE, and despite working my way through the course, I realize I have a lot of learning to do in order to become more familiar with how to use XE. While I got a basic session going, depending on when I started it and what I was experimenting with, I sometimes found myself with events that never went away, such as a commit or rollback with no corresponding opening transaction.
This T-SQL Tuesday was a bit broader in scope, so it has been interesting watching people respond.
This thread on the Power Query forum suggests it’s something to do with lazy evaluation, but I haven’t been able to determine the situations when it does work and when it doesn’t.
Instead, it is possible to handle specific HTTP error codes using the ManualStatusHandling option in Web.Contents()
I guess this beats not being able to handle errors at all, but it seems like a fairly fragile solution if you next want to start handling the entire 500 class of response codes.
Controlling the lifecycle of Spark can be cumbersome and tedious. Fortunately, Spark Testing Baseproject offers us Scala Traits that handle those low-level details for us. Streaming has an extra bit of complexity as we need to produce data for ingestion in a timely way. At the same time, Spark internal clock needs to tick in a controlled way if we want to test timed operations as sliding windows.
This is part one of a series. I’m interesting in seeing where this goes.
The basic steps can be described as follows:
When a Spark job starts, it will generate encryption keys and store them in the current user’s credentials, which are shared with all executors.
When shuffle happens, the shuffle writer will first compress the plaintext if compression is enabled. Spark will use the randomly generated Initial Vector (IV) and keys obtained from the credentials to encrypt the plaintext by using
CryptoOutputStreamwill encrypt the shuffle data and write it to the disk as it arrives. The first 16 bytes of the encrypted output file are preserved to store the initial vector.
For the read path, the first 16 bytes are used to initialize the IV, which is provided to
CryptoInputStreamalong with the user’s credentials. The decrypted data is then provided to Spark’s shuffle mechanism for further processing.
Once you have things optimized, the performance hit is surprisingly small.
So how do we handle the scenario where the server is rebooted?
- Option 1: always remember to restart the trace after server reboots
- Option 2: create a SQL Agent job to poll for the SSAS service status and start the xEvent trace if its not already running
- Option 3: write a custom .NET watchdog service to poll for the SSAS service status and start the xEvents trace if its not already running
Those are the options I’ve used or seen used in the past… and to be sure, all of them have their drawbacks in reliability and/or complexity.
…which is why I was so excited when it was brought to my attention that there is an “AutoRestart” option for SSAS xEvents!
Do read the whole thing.
Once the cluster is created, you can connect to the edge node where MRS is already pre-installed by SSHing to r-server.YOURCLUSTERNAME-ssh.azurehdinsight.net with the credentials which you supplied during the cluster creation process. In order to do this in MobaXterm, you can go to Sessions, then New Sessions and then SSH.
The default installation of HDI Spark on Linux cluster does not come with RStudio Server installed on the edge node. RStudio Server is a popular open source integrated development environment (IDE) available for R that provides a browser-based IDE for use by remote clients. This tool allows you to benefit from all the power of R, Spark and Microsoft HDInsight cluster through your browser. In order to install RStudio you can follow the steps detailed in the guide, which reduces to running a script on the edge node.
If you’ve been meaning to get further into Spark & R, this is a great article to follow along with on your own.
“Thanks to SQL Threat Detection, we were able to detect and fix code vulnerabilities to SQL injection attacks and prevent potential threats to our database. I was extremely impressed how simple it was to enable threat detection policy using the Azure portal, which required no modifications to our SQL client applications. A while after enabling SQL Threat Detection, we received an email notification about ‘An application error that may indicate a vulnerability to SQL injection attacks’. The notification provided details of the suspicious activity and recommended concrete actions to further investigate and remediate the threat. The alert helped me to track down the source my error and pointed me to the Microsoft documentation that thoroughly explained how to fix my code. As the head of IT for an information technology and services company, I now guide my team to turn on SQL Auditing and Threat Detection on all our projects, because it gives us another layer of protection and is like having a free security expert on our team.”
Anything which helps kill SQL injection for good makes me happy.