Press "Enter" to skip to content

Category: Cloud

Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job:

Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what has been changed in the query, data or environment. This is followed by analysis to discover which changes may have performance impact. While this is happening production workloads continue to under-perform or go down.

Azure Data Lake Tools for Visual Studio now makes it easy to spot anomalies and quickly trace the key characteristics across recurring job instances allowing for an efficient debugging experience. The Pipeline Browser automatically groups recurring jobs to simplify discovery of all runs. The Related Job View collects data about inputs, outputs and execution across multiple runs into a single visualization.

Read on for more.

Comments closed

SQL Server Backups To Azure Blob Storage

Kevin Hill shows  how to configure SQL Server to back up a database to Azure blob storage:

Note the “no blobs found” in the container.  After a successful backup, you will see it here.

Click on ‘Container Properties’ to get the URL for this specific container…this will be used in Backup and Restore statements.  Click the button next to the URL to copy it.  For now just remember where this is or copy it to Notepad, Query window etc.  When we start to build our T-SQL statements, we will need both the Access key from earlier and the URL.

Kevin gives clear, step by step instructions on the process.

Comments closed

Specifying IP Address On A Point To Site VPN

John Morehouse shows how to force a particular IP address when building an Azure point-to-site VPN:

Recently, I got to work with a client on something interesting. We implemented transactional replication to send data to an Azure virtual machine.  This was being done to perform some testing for a project.

Given that the two machines were NOT within the same Active Directory domain, we wanted to make sure our client’s data was protected, so we utilized a Point-to-Site VPN to facilitate this.  With the client using a VPN connection, this helps to ensure that any data transmitted to the virtual machine is encrypted and secured.  Note, the process on how to configure and implement the VPN connection is for another blog post.

SQL Server replication requires the use of a server name rather than just the IP addresses. This meant that the virtual machine in Azure had to use an entry in the local host file that was pointed back to the client’s machine.

The down side?  When the VPN connection drops (it happens), the client machine obtains a new IP address upon reconnecting.  Potentially now the host file would have the incorrect IP address and needs to be updated.

Read on to see how John was able to solve this.

Comments closed

Time Series Forecasting With DeepAR

Tim Januschowski, et al, introduce DeepAR on AWS:

Today we are launching Amazon SageMaker DeepAR as the latest built-in algorithm for Amazon SageMaker. DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce both point and probabilistic forecasts. We’re excited to give developers access to this scalable, highly accurate forecasting algorithm that drives mission-critical decisions within Amazon. Just as with other Amazon SageMaker built-in algorithms, the DeepAR algorithm can be used without the need to set up and maintain infrastructure for training and inference.

Click through for a product demonstration.

Comments closed

The Premise Of Cloud Data Warehousing

Derik Hammer explains how cloud data warehouses differ from their on-prem cousins:

Given the data processing needs of a data warehouse, they tend to be implemented on massively parallel processing (MPP) systems. The MPP architecture replies upon a shared nothing concept for distributing data across various slices. Compute nodes are layered on top of the storage and processes queries for data residing in its local slice. The control node is responsible for taking a query and dividing it up into smaller queries to be run in parallel on the compute nodes.

Read the whole thing.

Comments closed

Azure SQL Analytics

Arun Sirpal gives an introduction to Azure SQL Analytics:

Please see the prerequisites section within this document – YOU MUST do this else you will not be able to use this feature. https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-azure-sql#prerequisites

Once setup it should take approximately 15 minutes to start capturing and rendering back some data. Don’t be surprised if it does take a little longer as was the case for myself.

My biggest complaint is about the visuals; otherwise, this looks like the beginning of a solid monitoring solution within Azure SQL Database.

Comments closed

Row Counts From Statistics In Azure DW

Derik Hammer has a script to estimate row counts in an Azure SQL Data Warehouse table:

Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture designed for large-scale data warehouses. An MPP system creates logical / physical slices of the data. In SQL Data Warehouse’s case, the data has 60 logical slices, at all performance tiers. This means that a single table can have up to 60 different object_ids. This is why, in SQL Data Warehouse, there is the concept of physical and logical object_ids along with physical names.

Below is a query for finding row counts of tables in SQL Data Warehouse which accounts for the differences in architecture between my earlier script, written for SQL Server, and SQL Data Warehouse.

Click through for the script.

Comments closed

Failover Groups In Azure SQL Database

Jim Donahoe shows off Failover Groups in Azure SQL Database.  Part 1 involves setting up a Failover Group:

In my former company, we had 22 web applications that all had connections to various databases.  We had all of our databases configured for Geo-Replication already, but still if we had to failover, we had to update each connection string for the web apps along with others which became a tedious process. In came Failover Groups to the rescue!  With a Failover Group, I was able to create two endpoints that stayed the same no matter which server was primary/secondary.  I liked to think of these as my Availability Group Listeners as they kinda serve the same functionality: Route traffic to a node depending on if its read-only or not.  Best part?  It’s configured through the Azure Portal SO EASILY!  You can use PowerShell as well, but for this blog post, I will walk through the creation via the Portal.  I will make a separate post or attach a script at some point for the PowerShell deployment.

Before we start the configuration portion of this though, let’s take a look at how Microsoft defines what a Failover Group is.  I found this definition here:  “Azure SQL Database auto-failover groups (in-preview) is a SQL Database feature designed to automatically manage geo-replication relationship, connectivity, and failover at scale.”  Sounds pretty interesting, right? Let’s make one!

In Part 2, Jim shows how to connect to SQL Server using the Failover Group listener:

Well, now that the easy stuff is out of the way, let’s talk about how you connect to these groups via SSMS.  This is where some of the confusion happens.  When I first configured a Failover Group, the first thing I tried to do was connect to the Primary server via SSMS thinking it will work just like an Always On Listener in traditional SQL Server…NEWP!

If you’re running a production database on Azure SQL Database, you might want to look at Failover Groups.

Comments closed

Creating A Linked Server To Amazon Athena

Maria Zakourdaev shows that you can create a linked server connection in SQL Server to query data using Amazon Athena:

I will show you today how you can use Management Studio or any stored procedure to query the data, stored in a csv file, located on S3 storage. I am using CSV file format as an example here, columnar PARQUET gives much better performance.

I am going to:

1. Put a simple CSV file on S3 storage

2. Create External table in Athena service over the data file bucket

3. Create linked server to Athena inside SQL Server

4. Use OPENQUERY to query the data.

Athena service is built on the top of Presto, distributed SQL engine and also uses Apache Hive to create, alter and drop tables. You can run ANSI SQL statements in the Athena query editor, launching it from the AWS web services UI. You can use complex joins, window functions and many other great SQL language features. Using Athena eliminates need for ETL because it projects your schema on the data files at the time of the query.

Standard linked server warnings apply, but sometimes you need to bridge a couple technologies.

Comments closed

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot:

After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge.

a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text of your knowledge base.

b. Edit the source of your knowledge base. Click the Settings tab on the left to edit the URL of your FAQs or upload a new document.

Building a bot is pretty easy, and Dustin shows you just how to do it.

Comments closed