Press "Enter" to skip to content

Category: Cloud

Exposing Azure Data Lake Store Data With Power BI

Melissa Coates shows how you can use Power BI to access data in Azure Data Lake Store:

What can you query from ADLS?

You can connect to the data stored in Azure Data Lake Store. What you *cannot* connect to currently is the data stored in the Catalog tables/views/stored procedures within Azure Data Lake Analytics (hopefully connectivity to the ADLA Catalog objects from tools other than U-SQL is available soon).

You’re not sending a U-SQL query here. Rather, we’re sending a web API request to an endpoint.

With an ADLS data source, you have to import the data into Power BI Desktop. There is no option for DirectQuery.

In other words, data that you’ve already prepped using U-SQL and want to display to the outside world.  Click through for a demonstration as well as additional helpful information.

Comments closed

Azure SQL Data Warehouse Generation 2

James Serra announces changes to Azure SQL Data Warehouse:

The changes in Azure SQL DW Compute Optimized Gen2 tier are:

  • 5x query performance via a adaptive caching technology. which takes a blended approach of using remote storage in combination with a fast SSD cache layer (using NVMes) that places data next to compute based on user access patterns and frequency

  • Significant improvement in serving concurrent queries (32 to 128 queries/cluster)

  • Removes limits on columnar data volume to enable unlimited columnar data volume

  • 5 times higher computing power compared to the current generation by leveraging the latest hardware innovations that Azure offers via additional Service Level Objectives (DW7500c, DW10000c, DW15000c and DW30000c)

  • Added Transparent Data Encryption with customer-managed keys

Those are some good improvements.  #2 in particular makes it possible for Azure SQL DW to be useful in a much larger number of environments.

Comments closed

Notes On Automating Automatic Indexing

Grant Fritchey shares with us some of his findings with automatic indexing on Azure SQL Database:

What you’ll notice is that several of the queries are filtering on the FirstName column. There’s no good index there. If you look at the execution plans for those queries you’ll also note the Missing Index suggestion. That suggestion is a necessary part of the automatic indexing. Yeah, missing indexes. I know. They’re not always accurate. It’s just a suggestion. Blah, blah, blah. I hear you.

The magic is not supplied by missing indexes. The magic is supplied by lots of data. Microsoft can take advantage of three things. Yes, missing index suggestions is first. Then, they can use the query metrics gathered in Query Store to see the behavior of your queries over time. Finally, they can use machine learning algorithms to determine if indexes will be helpful and measure how helpful they’ve been if one gets added. It’s great stuff. Go and read on it.

Click through for more notes, as well as a Powershell script you can use to replicate his findings.

Comments closed

Demos Using Amazon QuickSight

Karthik Kumar Odapally and Pranabesh Mandal have several example visuals that you can generate using Amazon QuickSight:

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

  1. Connect to a data source, and then create a new dataset or choose an existing dataset.

  2. (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).

  3. Create a new analysis.

  4. Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.

  5. (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).

  6. (Optional) Add more visuals to the analysis.

  7. (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.

  8. (Optional) Publish the analysis as a dashboard to share insights with other users.

It’s interesting to see how Amazon is trying to move this functionality from third-party tools (Power BI, Tableau, etc.) and notebooks right into the set of AWS offerings.  Contrast this with the way that Microsoft is building in Jupyter with Azure Notebooks.

Comments closed

Introducing Azure Notebooks

Zach Stagers has an introductory post to Azure Notebooks:

No installation, no maintenance

As with any PaaS solution, Azure Notebooks makes it far quicker and easier to get up and running, as there’s no download or installation required. Microsoft handles all the maintenance for you too!

I’m working on a fairly big project using Azure Notebooks.  It’s very helpful getting 1GB of space, so I can include all of my data, images, etc. from a fairly large number of notebooks.  The big downside is that the server running these notebooks is pretty slow—even for a fairly simple ARIMA model, I had it sitting there for 10 minutes at 100% CPU.  So don’t expect to run a heavy workload against it.

Comments closed

Azure Data Lake Alerting

Jose Lara shows how to send alerts if you hit a utilization threshold:

If you want to see the step-by-step guide to create a new Log Analytics alert, check out our recent blog post on creating Log Analytics Alerts.

For the alert signal logic, use the following values:

  • Use the query from the previous step

  • Set the sum of AUs to 50 as the threshold (you can use any number that reflects your own threshold)

  • Set the trigger to 0: whenever the threshold is breached

  • Set the period and frequency for 24 hours.

Read the whole thing if you use Azure Data Lake Analytics; an unexpectedly large bill is a tough thing to swallow.

Comments closed

Running The Azure DTU Calculator On An Older Server

Jim Donahoe shows us how to get the Azure DTU calculator running on an older server without Powershell:

I recently had to do an analysis of a client’s database workload using the Azure DTU Calculator(DTU Calculator) and thought it might be interesting to share just how I did that.  I have run this tool numerous times on other clients via the PowerShell method and the Command Line method, however this client’s environment was: Windows Server 2008R2, and SQL Server 2008R2 SP3 and had to be done differently.

Now, from the DTU Calculator page itself, it tells you how the process works.  It essentially runs a perfmon trace for an hour with the following counters:

  • Processor – % Processor Time
  • Logical Disk – Disk Reads/sec
  • Logical Disk – Disk Writes/sec
  • Database – Log Bytes Flushed/sec

My client did not have PowerShell accessible for me to use unfortunately.  I normally prefer the PowerShell script, however in this case I had to use the Command Line Interface, they both return the same results.

Click through to see how Jim did it.

Comments closed

Copying Azure SQL Databases

Arun Sirpal noticed a problem when he tried to copy an Azure SQL Database:

Now, I was looking at the following code.

CREATE DATABASE CodeDBP1
  AS COPY OF CodeDB ( SERVICE_OBJECTIVE = 'P1' )  ;

You would think this is okay? I did, especially with the fact that it parsed and was executing. I was thinking a copy of the CodeDB database will be created as a premium P1 database regardless of what the source database service tier was. This  source database is 0.5GB in size under the basic tier and 40 minutes later the copy was still executing. It just didn’t seem right.

Click through for the solution.  If this is going to be normal behavior, I’d really like to see an error message.

Comments closed

Async Processing With Azure Analysis Services

Teo Lachev notes that you can process Azure Analysis Services cubes without maintaining an HTTP connection:

AAS supports processing tasks asynchronously with REST APIs. The difference is that the service component (REST API) maintains the connectivity to the server – thus reducing the chances of HTTP disconnections from the external application. Microsoft has provided a RestAPISample console app to help you get started. As with any REST API invocation, you’d need to register the app in the Azure Portal so that you can authenticate successfully. Other than that, it’s simple to invoke the REST API and Microsoft has provided step-by-step instructions.

Another, although synchronous, option is to run a PowerShell script in the Azure Cloud Shell environment. You can upload the script as a file. The script can ask you to provide credentials interactively (Get-Credentials method) or you can hardcode the credentials. Here is an example of a PowerShell script that processes a specific table.

Click through to check out how to do this.

Comments closed

Azure Data Factory v2 And Decompression

Ben Jarvis notes a file naming bug with Azure Data Factory v2 when decompressing files:

ADF V2 natively supports decompression of files as documented at https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support. With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases.

In our particular case the file names and extensions of the source files are all uppercase and when ADF uploads them it doesn’t alter the file extension e.g. if I upload 1234_567.CSV.GZ I get 1234_567.CSV.GZ in blob storage rather than 1234_567.CSV.

Click through for more details and be sure to vote on his Azure Feedback bug if this affects you.

Comments closed