Category: Cloud

Open Source ML With Azure

Published 2018-05-09 by Kevin Feasel

David Smith shares his Build conference slides:

The topic for my talk at the Microsoft Build conference yesterday was “Migrating Existing Open Source Machine Learning to Azure”. The idea behind the talk was to show how you can take the open-source tools and workflows you already use for machine learning and data science, and easily transition them to the Azure cloud to take advantage of its capacity and scale. The theme for the talk was “no surprises”, and other than the Azure-specific elements I tried to stick to standard OSS tools rather than Microsoft-specific things, to make the process as familiar as possible.

Click through for the slides and additional resources.

Comments closed

Writing Extended Events To Blob Storage

Published 2018-05-09 by Kevin Feasel

Arun Sirpal shows how to write an Azure SQL Database extended events session to blob storage:

Last year I wrote about Azure SQL Database extended events (https://blobeater.blog/2017/02/06/using-extended-events-in-azure/) and gave an example where I was capturing deadlocks via the ring buffer. Ever since then I wanted to do a follow-up post but using Azure storage as the target for my XEL files.

This is more complicated than using the ring buffer as the target and requires a couple of things:

Azure storage account where you create a dedicated container for the files.
SAS key.
Database master key.
Database scoped credential.

Also, fantastic obfuscation in the post. Hello, Shane!

Comments closed

Determining Compute Resources From Managed Instances

Published 2018-05-08 by Kevin Feasel

Dimitri Furman explains Azure SQL Database Managed Instance resource allocations:

In the current Azure SQL Database Managed Instance (MI) preview, when customers create a new instance, they can allocate a certain number of CPU vCores and a certain amount of disk storage space for the instance. However, there is no explicit configuration option for the amount of memory allocated to the instance, because on MI, memory allocation is proportional to the number of vCores used.

How can a customer determine the actual amount of memory their MI instance can use, in GB? The answer is less obvious than it may seem. Using the traditional SQL Server methods will not provide the right answer on MI. In this article, we will go over the technical details of CPU and memory allocation on MI, and describe the correct way to answer this question.

The information and behavior described in this article are as of the time of writing (April 2018). Some aspects of MI behavior, including the visibility of certain compute resource allocations, may be temporary and will likely change as MI progresses from the current preview to general availability and beyond. Nevertheless, customers using MI in preview will find that this article answers some of the common questions about MI resource allocation.

What you see in SQL Server Management Studio is true, but it is also not the whole picture.

Comments closed

Reading AWS Aurora Error Logs With Powershell

Published 2018-05-08 by Kevin Feasel

Michael Bourgon has a Powershell script which reads error logs from AWS Aurora:

Been working on monitoring. For some reason, when you tell Aurora to send errorlogs to Cloudwatch, all it sends are the Audit Logs, which will tell you that code had changed, etc, but doesn’t (!?!?!??!!!) actually put your logs in Cloudwatch. I don’t understand it, so I built this process to look through logs and return the data. The next step would be to format it and either upload to Cloudwatch manually, or log it, or send email.

Click through for the script.

Comments closed

Exposing Azure Data Lake Store Data With Power BI

Published 2018-05-08 by Kevin Feasel

Melissa Coates shows how you can use Power BI to access data in Azure Data Lake Store:

What can you query from ADLS?

You can connect to the data stored in Azure Data Lake Store. What you *cannot* connect to currently is the data stored in the Catalog tables/views/stored procedures within Azure Data Lake Analytics (hopefully connectivity to the ADLA Catalog objects from tools other than U-SQL is available soon).

You’re not sending a U-SQL query here. Rather, we’re sending a web API request to an endpoint.

With an ADLS data source, you have to import the data into Power BI Desktop. There is no option for DirectQuery.

In other words, data that you’ve already prepped using U-SQL and want to display to the outside world. Click through for a demonstration as well as additional helpful information.

Comments closed

Azure SQL Data Warehouse Generation 2

Published 2018-05-03 by Kevin Feasel

James Serra announces changes to Azure SQL Data Warehouse:

The changes in Azure SQL DW Compute Optimized Gen2 tier are:

5x query performance via a adaptive caching technology. which takes a blended approach of using remote storage in combination with a fast SSD cache layer (using NVMes) that places data next to compute based on user access patterns and frequency
Significant improvement in serving concurrent queries (32 to 128 queries/cluster)
Removes limits on columnar data volume to enable unlimited columnar data volume
5 times higher computing power compared to the current generation by leveraging the latest hardware innovations that Azure offers via additional Service Level Objectives (DW7500c, DW10000c, DW15000c and DW30000c)
Added Transparent Data Encryption with customer-managed keys

Those are some good improvements. #2 in particular makes it possible for Azure SQL DW to be useful in a much larger number of environments.

Comments closed

Notes On Automating Automatic Indexing

Published 2018-05-01 by Kevin Feasel

Grant Fritchey shares with us some of his findings with automatic indexing on Azure SQL Database:

What you’ll notice is that several of the queries are filtering on the FirstName column. There’s no good index there. If you look at the execution plans for those queries you’ll also note the Missing Index suggestion. That suggestion is a necessary part of the automatic indexing. Yeah, missing indexes. I know. They’re not always accurate. It’s just a suggestion. Blah, blah, blah. I hear you.

The magic is not supplied by missing indexes. The magic is supplied by lots of data. Microsoft can take advantage of three things. Yes, missing index suggestions is first. Then, they can use the query metrics gathered in Query Store to see the behavior of your queries over time. Finally, they can use machine learning algorithms to determine if indexes will be helpful and measure how helpful they’ve been if one gets added. It’s great stuff. Go and read on it.

Click through for more notes, as well as a Powershell script you can use to replicate his findings.

Comments closed

Demos Using Amazon QuickSight

Published 2018-04-30 by Kevin Feasel

Karthik Kumar Odapally and Pranabesh Mandal have several example visuals that you can generate using Amazon QuickSight:

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.

It’s interesting to see how Amazon is trying to move this functionality from third-party tools (Power BI, Tableau, etc.) and notebooks right into the set of AWS offerings. Contrast this with the way that Microsoft is building in Jupyter with Azure Notebooks.

Comments closed

Introducing Azure Notebooks

Published 2018-04-30 by Kevin Feasel

Zach Stagers has an introductory post to Azure Notebooks:

No installation, no maintenance

As with any PaaS solution, Azure Notebooks makes it far quicker and easier to get up and running, as there’s no download or installation required. Microsoft handles all the maintenance for you too!

I’m working on a fairly big project using Azure Notebooks. It’s very helpful getting 1GB of space, so I can include all of my data, images, etc. from a fairly large number of notebooks. The big downside is that the server running these notebooks is pretty slow—even for a fairly simple ARIMA model, I had it sitting there for 10 minutes at 100% CPU. So don’t expect to run a heavy workload against it.

Comments closed

Azure Data Lake Alerting

Published 2018-04-25 by Kevin Feasel

Jose Lara shows how to send alerts if you hit a utilization threshold:

If you want to see the step-by-step guide to create a new Log Analytics alert, check out our recent blog post on creating Log Analytics Alerts.

For the alert signal logic, use the following values:

Use the query from the previous step
Set the sum of AUs to 50 as the threshold (you can use any number that reflects your own threshold)
Set the trigger to 0: whenever the threshold is breached
Set the period and frequency for 24 hours.

Read the whole thing if you use Azure Data Lake Analytics; an unexpectedly large bill is a tough thing to swallow.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31