Category: Cloud

However, I’m working with a customer who is building their own service based on Azure SQL Database, and I have fully automated their database deployment process. I wanted to take this a few steps further and add the SQL Analytics step as part of our deployment. This was harder than I expected it to be—the code samples in the books online post above weren’t working in my environment. And furthermore, once I got it working, I was having failures in my Azure Automation Runbook once I got the code running in the PowerShell ISE (I was having issues using VS Code on my Mac).

Joey takes us through the problems and provides a working script.

Comments closed

Querying Cosmos DB Execution Metrics

Published 2019-01-24 by Kevin Feasel

Hasan Savran shows us how to retrieve execution metrics for a Cosmos DB call:

When I speak about CosmosDB, I always get questions like “How can I retrieve information about the execution plans?” or “Isn’t there a tool like SSMS which can show me what’s happening in the background?” Usually, questions like that comes from DBAs. If you have questions like that, I have good and bad news for you. Good news is, Yes you can get retrieve metrics from CosmosDB about execution plans. Bad news is, you need to know some programming to be able to do that because you need to use CosmosDB SDK.

The only way to access this information is from CosmosDB SDK 2.x. I couldn’t retrieve execution metrics by using SDK 3.x for custom queries. Here is the available metrics you can retrieve from CosmosDB queries.

I wonder if this is a “this is still new” thing, a “you don’t need these where you’re going” thing, or a “this is exactly how we envisioned implementation” thing. Especially around getting per-query metrics after the fact.

Comments closed

AMD vs Intel CPUs For Data Processing Jobs

Published 2019-01-23 by Kevin Feasel

Hariharan Iyer and Abhishek Srivastava run some tests against AWS’s new AMD-powered EC2 instances:

Our summary findings from TPCDS benchmarks are as follows:
– TPCDS queries are not as sensitive to local disk performance (and hence to EBS volume sizes)
– r5 (Intel) instances are consistently faster than r5a (AMD) instances. However, the speedup depends on the engine and the speedup for r5 (Intel) is lower for Spark (10%) than for Hive (25%).
– r5 instances are also either cheaper (by about 10% for Hive) or the same cost (for Spark)

At least for Hadoop and Spark work, Intel CPUs are a bit better, but there is some nuance in the story so check it out.

Comments closed

Running RStudio Server In Azure

Published 2019-01-23 by Kevin Feasel

David Smith notes that RStudio Server Pro is now available on Azure:

RStudio Server Pro is now available on the Azure Marketplace, the company announced on the RStudio Blog earlier this month. This means you can launch RStudio Server Pro on an virtual machine with the memory, disk, and CPU configuration of your choice, and pay by the minute for the VM instance plus a the RStudio software charge. Then, you can use a browser to access the remote RStudio Server (the interface is nigh-indistinguishable from the desktop version), with access to the commercial features of RStudio including support for multiple R version and concurrent R sessions, load-balancing and high availability instances, and enhanced security.

RStudio Server Pro and Microsoft R Server are both very nice for production-quality R servers. You can get away with the open source versions, but there are some good reasons to use the enterprise-grade products in an enterprise.

Comments closed

Power BI Dataflow Use Cases

Published 2019-01-23 by Kevin Feasel

Reza Rad gives us the value proposition behind Power BI Dataflow:

If you don’t have an account in Azure or you don’t have a subscription that you can use for Azure Data Lake, No need to worry! You can still use Dataflow. The whole process of storing data into Azure Data Lake is internally managed through Dataflow. You won’t even need to login to the Azure portal or anywhere else. From your point of view, in the Power BI website, you create a dataflow, and that dataflow manages the whole storage configuration. You won’t need to have any other accounts or pay anything extra or more than what you are paying for Power BI subscription.

Click through for use cases and some tips.

Comments closed

Cloud Risk: Service Obsolescence

Published 2019-01-23 by Kevin Feasel

Joy George Kunjikkur takes us through a risk scenario using an example of the Azure chat bot service:

Beginning of last year, we started to develop a chat bot demo. The idea was to integrate the chat bot into one of the big applications as a replacement to FAQ. Users can ask questions to bot thus avoiding obvious support tickets in the future.

Things went well. We got appreciation on the demo and started moving to production. About half way, things started turning south. The demo chat bot application used Bot SDK V3. It had voice recognition enabled which allow users to talk to it and get the response back in voice. During the demo we used Azure Bing Speech API. But later before the production, we got the notice that the service is obsolete and will be retired mid 2019. Another surprise was the introduction of Bot SDK V4 which is entirely different that Bot SDK V3. Something like AngularJS v/s Angular.

The major services tend to give you some time to switch over—in this case, they had 10 months to make a move. But when dealing with online services versus locally installed products, there’s always a risk that the service you’re calling won’t be there, and depending upon how critical that service is, it can have a major effect on your ability to function if it disappears one day. That’s definitely not a reason to ignore these services; it’s a reason to have a backup plan in place.

Comments closed

Azure Databricks And Active Directory

Published 2019-01-22 by Kevin Feasel

Tristan Robinson wraps up a two-parter on Azure Databricks security:

With the addition of Databricks runtime 5.1 which was released December 2018, comes the ability to use Azure AD credential pass-through. This is a huge step forward since there is no longer a need to control user permissions through Databricks Groups / Bash and then assigning these groups access to secrets to access Data Lake at runtime. As mentioned previously – with the lack of support for AAD within Databricks currently, ACL activities were done on an individual basis which was not ideal. By using this feature, you can now pass the authentication onto Data Lake, and as we know one of the advantages of Data Lake is the tight integration into Active Directory so this simplifies things. Its worth noting that this feature is currently in public preview but having tested it thoroughly, am happy with the implementation/limitations. The feature also requires a premium workspace and only works with high concurrency clusters – both of which you’d expect to use in this scenario.

It looks like this is the way to go forward with securing Azure Databricks. Read the whole thing.

Comments closed

Recreating Dropped Azure SQL Managed Instance DBs

Published 2019-01-22 by Kevin Feasel

Jovan Popovic has a script to re-create an Azure SQL Managed Instance database which you might accidentally have dropped:

Azure SQL Database – Managed Instance is fully-managed PaaS service that provides advanced disaster-recovery capabilities. Even if you accidentally drop the database or someone drops your database as part of security attack, Managed Instance will enable you to easily recover the dropped database.
Azure SQL Managed Instance performs automatic backups of you database every 5-10 minutes. If anything happens with your database and even if someone drops it, your data is not lost. Managed Instance enables you to easily re-create the dropped database from the automatic backups.

Click through for the Powershell script.

Comments closed

Azure Databricks Security

Published 2019-01-21 by Kevin Feasel

Tristan Robinson looks at what’s currently available in terms of security on Azure Databricks:

You’ll notice that as part of this I’m retrieving the secrets/GUIDS I need for the connection from somewhere else – namely the Databricks-backed secrets store. This avoids exposing those secrets in plain text in your notebook – again this would not be ideal. The secret access is then based on an ACL (access control list) so I can only connect to Data Lake if I’m granted access into the secrets. While it is also possible to connect Databricks up to the Azure Key Vault and use this for secrets store instead, when I tried to configure this I was denied based on permissions. After research I was unable to overcome the issue. This would be more ideal to use but unfortunately there is limited support currently and the fact the error message contained spelling mistakes suggests to me the functionality is not yet mature.

To be charitable, there appears to be room for implementation improvement.

Comments closed

Data Lake Organization Tips

Published 2019-01-21 by Kevin Feasel

Melissa Coates has some great advice for people working with data lakes:

Q: Partitioning by date is common. Where should the dates go in the folder hierarchy?
Almost always, you will want the dates to be at the end of the folder path. This is because we often need to set security at specific folder levels (such as by subject area), but we rarely set up security based on time elements.
Optimal for folder security: \SubjectArea\DataSource\YYYY\MM\DD\FileData_YYYY_MM_DD.csv
Tedious for folder security: \YYYY\MM\DD\SubjectArea\DataSource\FileData_YYYY_MM_DD.csv

Click through for all of Melissa’s advice in FAQ form.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31