Press "Enter" to skip to content

Category: Cloud

Handling Spatial Data in Cosmos DB

Hasan Savran gives us some tips on storing spatial data in Cosmos DB:

Importing Spatial Data into CosmosDB can be a challenge. CosmosDB is not a relational database and you may need to change your data model structure to add spatial data. You cannot create a new container for spatial data and plan to join this container to your other containers. There are free tools which might help with GeoJson conversion, but you may still need to add converted geoJson data into your data models. Spatial data becomes very powerful when you find a way to join it with your application’s data.

    In the following example, I am going to download the hurricanes from NOAA website. Data is in CSV format so we may need to transform data to create a good data model for CosmosDB. I downloaded the all hurricane data for 2005 which was the year of Katrina hurricane. First thing I did, was to change the name of columns and make them more user-friendly. I have used the following names for columns. Here is a sample row from the CSV file.

Click through for the example.

Comments closed

Executing Azure SSIS Packages from Blob Storage

Andy Leonard cranks it to the next level:

I confess: I have been waiting for this feature since I first learned of Azure-SSIS.

When I first saw Azure-SSIS – which creates an Azure Data Factory Integration Runtime and SSIS Catalog in the cloud, my first thought was a paraphrase Ferris Bueller’s question about dying the river green: “If we can execute SSIS packages from the SSIS Catalog in Azure Data Factory, why can’t we execute SSIS packages from Azure Blob Storage?” Today, we can.

Read on to see how you can do it.

Comments closed

Building a Big Data Cluster

Mohammad Darab continues a series on SQL Server Big Data Clusters in Azure Kubernetes Service:

To kick off the Big Data Cluster “Default configuration” creation, we will execute the following Powershell command:

mssqlctl cluster create

That will first prompt us to accept the license terms. Type y and Enter. 

Mohammad takes us through the default installation, which requires only a few parameters before it can go on its merry way.

Comments closed

Using the Cosmos DB Data Migration Tool

Hasan Savran shows how you can use the Cosmos DB Data Migration Tool to move data from various sources into Cosmos DB:

All you need is the connection string of your database or location of your source files and your CosmosDB keys. If you are using a database as source, you can format the data model pretty easy. You might have issues if the source you use does not have a JSON data type. JSON Array might look like string in CosmosDB because of data type mapping problems.

I am going to use CSV file as source in the following example. You can’t define a column as JSON array in excel. I have the following two columns in my CSV file. If I import these values into CosmosDB as they are, I see coordinates field’s data type will turn into a text field in CosmosDB. You can go back and try to update them in CosmosDB but that will be an expensive solution.

The one downside to this tool is that it doesn’t work with collections defined using the Mongo API.

Comments closed

Creating an Azure Databricks Cluster

Brad Llewellyn shows how you can create an Azure Databricks cluster:

There are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data.  We will dig into each of these in due time.  For this post, we’re going to talk about Clusters.  Clusters are where the work is done.  Clusters themselves do not store any code or data.  Instead, they operate the physical resources that are used to perform the computations.  So, it’s possible (and even advised) to develop code against small development clusters, then leverage the same code against larger production-grade clusters for deployment.  Let’s start by creating a small cluster.

Read on for an example.

Comments closed

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4:

We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.
– Simplifies integrated security by using IAM Role Passthrough for metadata in Glue.
– Provides easier access to metadata across the Amazon stack and access to data catalogued in Glue.

There are some interesting changes in here.

Comments closed

Building an AKS Cluster

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service:

Next, we will create a resource group by executing the following command:
az group create –name nameOfMyresourceGroup –location eastus2

Once you execute the above command, you can go into the Azure portal and refresh your resource group pane and see the newly created resource group.

Once that is setup, it’s time to create the actual Kubernetes cluster.

Click through for the full set of instructions.

Comments closed

Warning on Azure Consumption

Daniel Hutmacher doesn’t want you to have any Azure billing surprises:

I wrote this quick-and-dirty script to let me know if I happen to forget to turn off a P15 instance, or if I configure a service with a super-expensive performance tier without realizing. Maxing out your free Azure credits may be depressing enough, but emptying your credit card could really put you in the hurt locker.

So, here’s a Powershell script that warns me before any of this happens. It uses the Azure Consumption API to check how much money we’ve racked up on a subscription so far, and if any single instance exceeds, say, 50% of that total cost, it sends a notification to a Slack channel.

The wallet you save may be your own.

Comments closed

DBAs in the Cloud

Brent Ozar argues that production DBAs will still be important even at cloud-only companies:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.

I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.

Comments closed

Comparing On-Prem To Managed Instance Performance

Jovan Popovic has an article explaining how you can compare your current on-premises SQL Server’s performance to an Azure SQL Managed Instance’s performance:

In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:

1. Compare the environment settings on SQL Server and Managed Instance. 
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline

In the following sections will be described the best practices and the recommended approaches 

This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.

Comments closed