Press "Enter" to skip to content

Category: Cloud

Running Spark MLlib to Feed Power BI

Brad Llewellyn shows how you can take Spark MLlib results and feed them into Power BI:

MLlib is one of the primary extensions of Spark, along with Spark SQL, Spark Streaming and GraphX.  It is a machine learning framework built from the ground up to be massively scalable and operate within Spark.  This makes it an excellent choice for machine learning applications that need to crunch extremely large amounts of data.  You can read more about Spark MLlib here.

In order to leverage Spark MLlib, we obviously need a way to execute Spark code.  In our minds, there’s no better tool for this than Azure Databricks.  In the previous post, we covered the creation of an Azure Databricks environment.  We’re going to reuse that environment for this post as well.  We’ll also use the same dataset that we’ve been using, which contains information about individual customers.  This dataset was originally designed to predict Income based on a number of factors.  However, we left the income out of this dataset a few posts back for reasons that were important then.  So, we’re actually going to use this dataset to predict “Hours Per Week” instead.

Check it out. And Brad’s not joking when he says the resulting model is terrible. But that’s okay, because it was never about the model.

Comments closed

Building an AKS Cluster with Azure DevOps

Rob Sewell shows how to use Azure DevOps to build an AKS cluster with Terraform:

In the last few posts I have moved from building an Azure SQL DB with Terraform using VS Code to automating the build process for the Azure SQL DB using Azure DevOps Build Pipelines to using Task Groups in Azure DevOps to reuse the same Build Process and build an Azure Linux SQL VM and Network Security Group. This evolution is fantastic but Task Groups can only be used in the same Azure DevOps repository. It would be brilliant if I could use Configuration as Code for the Azure Build Pipeline and store that in a separate source control repository which can be used from any Azure DevOps Project.

Luckily, you can

And Rob shows us how it’s done.

Comments closed

Query Editing with Azure SQL Database

Dave Bland shows off the Azure SQL Database query editor built into the Azure portal:

We have all been using SQL Server Management Studio to query and manipulate data, even for an Azure SQL database.  There is also an option to do this same thing built into the SQL Azure database interface in the Azure portal.  Although there have been a number of posts related to this topic dating back a few years, this feature is still marked as “preview” in the Azure portal.

Click through to see how it works, what you can do with it, and some of its limitations.

Comments closed

Backing Up Database to Azure Blob Storage

Jamie Wick shows us how we can back up database directly to Azure Blob Storage:

Azure storage, as a backup destination for SQL backups, is a great option for organizations that are contemplating replacing older on-prem NAS appliances or improve their Disaster Recovery functionality. The tiered storage pricing, along with local and global redundancy options, can be much more cost-effective than many traditional backup options.

In this post, we’re going to look at some of the key concepts and restrictions, along with how to back up an SQL database to an Azure storage location.

Click through for the demo.

Comments closed

Scaling Power BI Premium Capacity

Matt Allington gives us instructions on how to scale Power BI Premium capacity:

This is the third article in my series about how to make Power BI Premium more affordable for small to medium sized enterprises (SMEs).  In my first article I explained the problem and the logic behind how to configure a workable solution. In my second article I provided step by step instructions on how to configure Flow to start/stop Power BI Premium capacities.  In the article today I am covering a way to scale the capacity up/down either on demand, or on a timed schedule.

The cloud is generally more expensive than on-prem, though it can potentially become cheaper if you are smart about scaling and have more scaling-friendly workloads. Matt even provides a really cool cost analysis to help you figure out what (if anything) you end up saving using this technique.

Comments closed

Exactly-Once Writes From Kafka To S3

Konstantine Karantasis takes us through writing from a Kafka topic into S3:

When customers were asking for an S3 connector, there were already several Kafka-to-S3 solutions out there at the time, so we had to decide whether to adopt an existing S3 connector, modify the Kafka Connect HDFS connector (as some developers attempted to do) or write a new connector from scratch.

We knew that our users needed three things from the connector:
1. Integration with the Kafka Connect API: Connect’s scaling and fault tolerance capabilities were important to have, and users didn’t want yet another system that they’d need to learn how to use, deploy and monitor.
2. Exactly once: Users didn’t want to waste expensive compute cycles on deduplicating their data. And no one likes missing events.
3. No extra dependencies: Especially dependencies on additional datastores. Kafka clients and the S3 SDK libraries should be all you need to get events from Kafka to S3. Simplicity rules, especially in a distributed systems world where simple is often the key to being reliable.

When we considered the existing connectors, we noticed that none of them delivered the reliability and exactly once capabilities we wanted. They treat S3 like it’s another file system—though it isn’t really. For example, S3 lacks file appends, it is eventually consistent, and listing a bucket is often a very slow operation.

Click through for a dive into what Confluent did and how it works.

Comments closed

Azure SQL Managed Instance Public Endpoints

Danimir Ljepava announces public endpoints for Azure SQL Managed Instances:

Public endpoint, ability to connect to Azure SQL Database Managed Instance from Internet, without VPN has reached global availability today. The release of this feature will help support many new integration scenarios.
 
The public endpoint for Managed Instance can today be enabled/disabled via PowerShell script. The support for Azure portal will be coming within the next week or so as soon as all updates are rolled out.

Click through to learn how to enable it with Powershell.

Comments closed

The Benefits of Partitioning in CosmosDB

Hasan Savran explains why partitioning data in CosmosDB is so important:

In any Introduction level CosmosDB talk, Presenter will suggest you pay more attention to the Partitioning part of the talk. Microsoft wants you to create great solution by using Azure CosmosDB and there are a lot of resources out there for developers. The challenge in Azure CosmosDB is, you don’t need a DBA in CosmosDB and developers may not pay attention to details like Partitioning when they create databases in Azure CosmosDB.
  
      It is crucial to pick the right partition key for your databases in CosmosDB simply because you cannot repartition your databases. If you pick a wrong partition key for your data model, it will be simply too late to change it later. You might have good amount of data in your databases, that means you spent good amount of RU to insert this data and only way to fix this problem will be start from scratch and upload all data back again to CosmosDB with the right partition key.

Read on for a couple of analogies and why this is so important.

Comments closed

Blob Storage for Database Backups

Randolph West has a couple of tools to help upload and download database backup files:

I wrote it because AzCopy was weak and inconsistent. It was fragile, needing constant attention and monitoring in case a journalling file got stuck. Also, AzCopy didn’t keep files in sync. If a file was deleted locally (as part of a cleanup to delete old backups), AzCopy was unable to delete files remotely, so it was messy to maintain files in Blob Storage containers. The uploader was written to keep files in sync, and not have to fuss with AzCopy.

The real value of this tool though, is being able to recover the latest backup files (full, differential and transaction logs where available) which are needed to recover from a catastrophic failure. Without any knowledge of the backups, just knowing the database name, it can parse the list of files in Azure, download the necessary ones to recover, and build a T-SQL script to restore them. Literally all you need to do is run the downloader, then run the restore script.

Randolph talks about how the state of AzCopy has changed and offers up some new guidance as well as tooling updates.

Comments closed

Getting Started with Azure Databricks

Brad Llewellyn has a tutorial for Azure Databricks:

Databricks is a managed Spark framework, similar to what we saw with HDInsight in the previous post.  The major difference between the two technologies is that HDInsight is more of a managed provisioning service for Hadoop, while Databricks is more like a managed Spark platform.  In other words, HDInsight is a good choice if we need the ability to manage the cluster ourselves, but don’t want to deal with provisioning, while Databricks is a good choice when we simply want to have a Spark environment for running our code with little need for maintenance or management.

Azure Databricks is not a Microsoft product.  It is owned and managed by the company Databricks and available in Azure and AWS.  However, Databricks is a “first party offering” in Azure.  This means that Microsoft offers the same level of support, functionality and integration as it would with any of its own products.  You can read more about Azure Databricks herehereand here.

Click through for a demonstration of the product.

Comments closed