Press "Enter" to skip to content

Category: Cloud

Flotilla: Containerized Task Execution

Akshay Wadia, et al, introduce a new open source tool:

Data scientists are not always equipped with the requisite engineering skills to deploy robust code to a production job execution and scheduling system. Yet, forcing reliance on data platform engineers will impede the scientists’ autonomy. If only there was another way.

Today we’re excited to introduce Flotilla, our latest open source project. Flotilla is a human friendly service for task execution. It allows you to focus on the work you’re doing rather than how to do it. In other words, Flotilla takes the struggle out of defining and running containerized jobs.

It looks like an interesting service.

Comments closed

Temp Tables In Redshift

Derik Hammer has some notes on temporary tables in Amazon Redshift:

One difference between regular tables and temporary tables is how they are typically used. Temporary tables are session scoped which means that adding them into a process or report will probably cause them to be created multiple times. Temporary tables might be very similar to regular tables but most regular tables are not re-written into, every time they are queried.

The disk writes involved in populating the temporary table might be more expensive than the reads would be if you were to modify your query to include the logic into one, larger, query. The frequency of the report or process will be a factor into how much of a performance hit you get by using the temporary tables. If you are using temporary tables to make debugging a procedure easier or to enhance readability, make sure you understand the IO cost of performing writes and then reading that data back into a subsequent query.

Read on for more.

Comments closed

Analyzing Spatial Data With Cosmos DB

Ben Jarvis shows how to query spatial data from Cosmos DB:

The above code connects to Cosmos DB and retrieves the details for the base airfield that was specified, it then calculates the range of the aircraft in meters by multiplying the endurance (in hours) by the true airspeed in knots (nautical miles per hour) and then multiplying that my 1852 (number of meters in a nautical mile). A Linq query is then run against Cosmos DB using the built-in spatial functions to find airfields within the specified distance. The result is then converted into a JSON array that can be understood by the Google Maps API that is being used on the client side.

The client side uses the Google Maps API to plot the airfields on a map, giving us a view like the one below when given a base airfield of Blackbushe (EGLK), a true airspeed of 100kts and an endurance of 4.5 hours

Click through for .NET code to load and analyze the data.

Comments closed

Azure SQL Database Managed Instances

Tim Radney explains the idea behind Azure SQL Database Managed Instances:

Managed Instances were first introduced at the Microsoft Build conference in Spring 2017, and so far the preview has been limited to a small number of customers and consultants. Managed Instances can be considered a hybrid between a full version of SQL Server and Azure SQL Database. Single and elastic databases are built upon a database-scoped programming model and Managed Instances are built on an instance-scoped programming model. This makes Managed Instances more compatible with on-premises SQL Server.

Managed Instances provide much more of a feel of an on-premises SQL Server, yet are built on the same infrastructure as Azure SQL Database. What sets it apart from Azure SQL Database is that it presents an entire SQL Server instance to the customer. In Azure SQL Database, you configure a server, which is really a container, and then can have multiple databases in that server, but they can’t easily talk to one another. With Managed Instances, all databases within the instance are on the same actual SQL Server, so you have full support for cross-database queries. This is a huge feature for many applications that otherwise were not a good fit for Azure SQL Database, and I think that’s going to allow many more SQL Server applications to move into the cloud.

Managed Instances has the potential to make an Azure migration realistic for many companies sticking to the on-prem product or managing a bunch of IaaS VMs.

Comments closed

Data Migration And Visualization With Data Factory And Data Lake

Matt Basile has a video which shows him taking raw data in S3, moving it to Azure Data Lake Storage using Azure Data Factory, and then visualizing it with Power BI:

While this seems like a lot of parts just to copy a few files, it’s important to note I only scratched the surface of what ADF can do.  Think of ADF as an airline company that manages and enables cargo (data) movement between cities (data sources).  A pipeline represents the overall goal of moving certain cargo from one city to another. The linked service is the airport, which provides a landing point and access control for the cities. The dataset is the list of cargo to move, the activity is the flight itself, while the integration runtime is the airport infrastructure that makes the cargo movement possible.  A single pipeline requires all these objects to run successfully; however, many pipelines can use these same objects to complete different tasks.  Once you’ve created these data factory objects, it is straightforward to layer on additional functionality or more pipelines. ADF also has visual tools that make building these objects a breeze – to build my pipeline, all I had to do was click on “Copy data” in the visual tools start menu and follow the steps provided.

Matt has a video demonstrating the process as well.

Comments closed

Using The Cosmos DB Emulator

Cedric Charlier shows off how to use the Azure Cosmos DB emulator to create and query a graph database:

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Read on to learn more.

Comments closed

SQL Server Backups On Azure VM

Rolf Tesmer shows us various options available for backing up SQL Server on Azure VMs:

Recently I had a requirement to collate and briefly compare some of the various methods to perform SQL Server backup for databases deployed onto Azure IaaS machines.  The purpose was to provide a few options to cater for the different types(OLTP, DW, etc) and sizes (small to big) of databases that could be deployed there.

Up front, I am NOT saying that these are the ONLY options to perform standard SQL backups!  I am sure there are others – however – the below are both supported and well documented – which when it comes to something as critical as backups is pretty important.

So the purpose of this blog is to provide a quick and brief list of various SQL backup methods!

Read on for the options.

Comments closed

Azure And The Kappa Architecture

Jared Zagelbaum describes the Kappa architecture and points out how there’s limited built-in support in Azure for it:

You can’t support kappa architecture using native cloud services. Cloud providers, including Azure, didn’t design streaming services with kappa in mind. The cost of running streams with TTL greater than 24 hours is more expensive, and generally, the max TTL tops out around 7 days. If you want to run kappa, you’re going to have to run Platform as a Service (PaaS) or Infrastructure as a Service (IaaS), which adds more administration to your architecture. So, what might this look like in Azure?

Read the whole thing.

Comments closed

The Whys Of Azure ML Workbench

Ginger Grant explains why Azure Machine Learning Workbench exists:

Microsoft is looking for Azure Machine Learning Workbench for more than a tool to use for Machine Learning analysis. It is part of a system to manage and monitor the deployment of machine learning solutions with Azure Machine Learning Model Management. The management aspects are part of the application installation.  To install the Azure Machine Learning Workbench, the application download is available only by creating an account in Microsoft’s Azure environment, where a Machine Learning Model Management resource will be created as part of the install. Within this resource, you will be directed to create a virtual environment in Azure where you will be deploying and managing Machine Learning models.

This migration into management of machine learning components is part of a pattern first seen on the on-premises version of data science functionality.  First Microsoft helped companies manage the deployment of R code with SQL Server 2016 which includes the ability to move R code into SQL Server.  Providing this capability decreased the time it took to implement a data science solution by providing a means for the code can be deployed easily without the need for the R code to be re-written or included in another application. SQL Server 2017 expanded on this idea by allowing Python code to be deployed into SQL Server as well.  With the cloud service Model Management, Microsoft is hoping to centralize the implementation so that all Machine Learning services created can be managed in one place.

Read on for more.

Comments closed