Cloud – Page 142 – Curated SQL

Orphaned Workspaces In Power BI

Published 2019-02-01 by Kevin Feasel

David Eldersveld explains what orphaned workspaces are in Power BI:

One of the newer features in the Power BI Admin Portal is the ability to view all of a tenant’s Workspaces. As I was browsing through the collection of workspaces, I noticed several marked as Orphaned. What is an orphaned workspace, and how does it occur?

I was expecting orphaned workspaces to be a new thing where you pay for an Azure service using a distributed blockchain technology called Gruel (or maybe Grool).

Comments closed

Azure Data Factory Data Flows

Published 2019-01-31 by Kevin Feasel

Joost van Rossum takes a look at data flows in Azure Data Factory:

2) Create Databricks Service
Yes you are reading this correctly. Under the hood Data Factory is using Databricks to execute the Data flows, but don’t worry you don’t have to write code.
Create a Databricks Service and choose the right region. This should be the same as your storage region to prevent high data movement costs. As Pricing Tier you can use Standard for this introduction. Creating the service it self doesn’t cost anything.

Joost shows the work you have to do to build out a data flow. This has been a big hole in ADF—yeah, ADF seems more like an ELT tool than an ETL tool but even within that space, there are times when you need to do a bit more than pump-and-dump.

Comments closed

Changing Red Hat’s SSH Port On An Azure VM

Published 2019-01-31 by Kevin Feasel

Paul Randal has a post showing you how to change the default SSH port on a Red Hat Enterprise Linux VM hosted in Azure:

The steps that need to be performed are:
– Allow the new port in the RHEL firewall
– Change the SSH daemon to listen on the new port
– Add an incoming rule in the VM network security group for the new port
– Remove the rule that allows port 22

The Ubuntu process will be pretty close to this as well.

Comments closed

Stream Analytics And Power BI

Published 2019-01-31 by Kevin Feasel

Brad Llewellyn gives us a demo on connecting a Stream Analytics stream to Power BI for data analysis:

We understand that streaming data isn’t typically considered “Data Science” by itself. However, it’s are often associated and setting up this background now opens up some cool applications in later posts. For this post, we’ll cover how to sink streaming data to Power BI using Stream Analytics.

The previous posts in this series used Power BI Desktop for all of the showcases. This post will be slightly different in that we will leverage the Power BI Service instead. The Power BI Service is a collaborative web interface that has most of the same reporting capabilities as Power BI Desktop, but lacks the ability to model data at the time of writing. However, we have heard whispers that data modeling capabilities may be coming to the service at some point. The Power BI Service is also the standard method for sharing datasets, reports and dashboards across organizations. For more information on the Power BI Service, read this.

Brad has a nice demo, so check it out.

Comments closed

Azure Kubernetes LoadBalancer External IP Woes

Published 2019-01-31 by Kevin Feasel

Andrew Pruski writes up some issues he had with creating a LoadBalancer service in Azure Kubernetes:

I logged a case with MS Support and when they came back to me, they advised that the service principal that is spun up in the background had expired. This service principal is required to allow the cluster to interact with the Azure APIs in order to create other Azure resources.
When a service is created within AKS with a type of LoadBalancer, a Load Balancer is created in the background which provides the external IP I was waiting on to allow me to connect to the cluster.
Because this principal had expired, the cluster was unable to create the Load Balancer and the external IP of the service remained in the pending state.

There were a lot of steps here; click through to see just how many.

Comments closed

Analytical Pipelines In R With H2O And AWS

Published 2019-01-29 by Kevin Feasel

Hanjo Oden wraps up a series on training models on AWS using H2O in R:

To generate these, you can log into your AWS dashboard, go to the IAM (Identity and Access Management) dashboard and select the Users tab. On the Userstab, add a user and also the administration rights that you want the user to have.Remember to restart R once you have filled in the access key information in the .Renviron file for it to take effect.
At this point, those familiar with cloudyr suite is probably asking – “This is exactly the same as library(aws.ec2), so why use boto3?“. Well, to be honest, I was using aws.ec2 for a while, but I find spot-instances, which the current version of aws.ec2 does not support. In addition I found that boto3 has some other functionalitue – which I prefer. For a full list of boto3 functions to interact with an EC2 instance, have a look at the reference manual.

It’s pretty good stuff; check it out.

Comments closed

Authentication With Azure Managed Instances

Published 2019-01-29 by Kevin Feasel

Hamish Watson explains that Windows authentication is not available with Azure Managed Instances:

Azure SQL Managed Instances do not utilise windows authentication – so your two methods of authenticating applications and users are:
– SQL Authentication:This authentication method uses a username and password.
– Azure Active Directory Authentication:This authentication method uses identities managed by Azure Active Directory and is supported for managed and integrated domains. Use Active Directory authentication (integrated security) whenever possible.

Hamish also elaborates on some of the trickier bits about Azure Active Directory for someone used to on-prem AD solutions.

Comments closed

Automating Azure SQL Database Log Analytics

Published 2019-01-25 by Kevin Feasel

Joey D’Antoni shares a Powershell script which automates deployment of Log Analytics for Azure SQL Database:

However, I’m working with a customer who is building their own service based on Azure SQL Database, and I have fully automated their database deployment process. I wanted to take this a few steps further and add the SQL Analytics step as part of our deployment. This was harder than I expected it to be—the code samples in the books online post above weren’t working in my environment. And furthermore, once I got it working, I was having failures in my Azure Automation Runbook once I got the code running in the PowerShell ISE (I was having issues using VS Code on my Mac).

Joey takes us through the problems and provides a working script.

Comments closed

Querying Cosmos DB Execution Metrics

Published 2019-01-24 by Kevin Feasel

Hasan Savran shows us how to retrieve execution metrics for a Cosmos DB call:

When I speak about CosmosDB, I always get questions like “How can I retrieve information about the execution plans?” or “Isn’t there a tool like SSMS which can show me what’s happening in the background?” Usually, questions like that comes from DBAs. If you have questions like that, I have good and bad news for you. Good news is, Yes you can get retrieve metrics from CosmosDB about execution plans. Bad news is, you need to know some programming to be able to do that because you need to use CosmosDB SDK.

The only way to access this information is from CosmosDB SDK 2.x. I couldn’t retrieve execution metrics by using SDK 3.x for custom queries. Here is the available metrics you can retrieve from CosmosDB queries.

I wonder if this is a “this is still new” thing, a “you don’t need these where you’re going” thing, or a “this is exactly how we envisioned implementation” thing. Especially around getting per-query metrics after the fact.

Comments closed

AMD vs Intel CPUs For Data Processing Jobs

Published 2019-01-23 by Kevin Feasel

Hariharan Iyer and Abhishek Srivastava run some tests against AWS’s new AMD-powered EC2 instances:

Our summary findings from TPCDS benchmarks are as follows:
– TPCDS queries are not as sensitive to local disk performance (and hence to EBS volume sizes)
– r5 (Intel) instances are consistently faster than r5a (AMD) instances. However, the speedup depends on the engine and the speedup for r5 (Intel) is lower for Spark (10%) than for Hive (25%).
– r5 instances are also either cheaper (by about 10% for Hive) or the same cost (for Spark)

At least for Hadoop and Spark work, Intel CPUs are a bit better, but there is some nuance in the story so check it out.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud