Cloud – Page 180 – Curated SQL

Using sparklyr enables you to analyze big data on Amazon S3 with R smoothly. You can build a Spark cluster easily with Cloudera Director. sparklyr makes Spark as a backend database of dplyr. You can create tidy data from huge messy data, plot complex maps from this big data the same way as small data, and build a predictive model from big data with MLlib. I believe sparklyr helps all R users perform exploratory data analysis faster and easier on large-scale data. Let’s try!

You can see the Rmarkdown of this analysis on RPubs. With RStudio, you can share Rmarkdown easily on RPubs.

Sparklyr is an exciting technology for distributed data analysis.

Comments closed

Azure Price Cuts

Published 2017-02-07 by Kevin Feasel

Brad Sams reports that Azure VM and Azure Blob Storage prices are going down:

Microsoft, Amazon and now Google are in a heated cloud race to grab as much market share as they can as they know that once a company starts using their service, the likelihood of switching platforms is low. With more services being offered via cloud vendors and more companies diving into these platforms, Microsoft and Amazon are frequently cutting prices to create a competitive advantage.

On this edition of ‘cloud cuts’, Microsoft is slashing prices on some of its Azure Virtual Machines and its Blob storage. The company is dropping the prices on compute-optimized instances – F Series and general purpose instances – A1; the company says pricing cuts on its D-series general purpose instances will happen in the near future.

Blob storage is down to 2 cents per GB per month for hot storage. That’s slightly below S3’s 2.3 cents per GB per month.

Comments closed

Integrating Data Lake Storage With SQL Data Warehouse

Published 2017-02-07 by Kevin Feasel

Sachin Sheth alerts us to a new integration point between Azure Data Lake Storage and Azure SQL Data Warehouse via Polybase:

Most common patterns using Azure Data Lake Store (ADLS) involve customers ingesting and storing raw data into ADLS. This data is then cooked and prepared by analytic workloads like Azure Data Lake Analytics and HDInsight. Once cooked this data is then explored using engines like Azure SQL Data Warehouse. One key pain point for customers is having to wait for a substantial time after the data was cooked to be able to explore it and gather insights. This was because the data stored in ADLS would have to be loaded into SQL Data Warehouse using tools row-by-row insertion. But now, you don’t have to wait that long anymore. With the new SQL Data Warehouse PolyBase support for ADLS, you will now be able to load and access the cooked data rapidly and lessen your time to start performing interactive analytics. PolyBase support will allow to you access unstructured/semi-structured files in ADLS faster because of a highly scalable loading design. You can load the files stored in ADLS into SQL Data Warehouse to perform analytics with fast response times or you use can the files in ADLS as external tables. So get ready to unlock the value stored in your petabytes of data stored in ADLS.

I’ve been waiting for this support, and I’m happy that they were able to integrate the two products.

Comments closed

Azure SQL Database Extended Events

Published 2017-02-07 by Kevin Feasel

Arun Sirpal compares on-prem extended events to what’s available in Azure SQL Database:

There are 22 actions and 261 events. Naturally less than your local based SQL Servers, for example on my local 2014 machine running the above query returned 50 actions and 284 events.

There are a few subtle differences and a couple not-so-subtle differences, so it’s worth digging into if you plan to spin up an Azure SQL Database database.

Comments closed

Local Azure Data Lake

Published 2017-02-07 by Kevin Feasel

Julie Koesmarno shows how to set up Azure Data Lake for local testing:

Late last year, I presented a Cognitive Intelligence demo using Azure Data Lake (ADL) at PASS Summit keynote. It was a fun and quick demo! Watch it here

In case you’re new to ADL, you can now (since Dec 2015) develop, compile and run ADL locally in Visual Studio. This is huge! Because you don’t have to worry about your ADL Analytics Unit (AU) consumptions. Plus, this allows you to try it before you buy it too!

Click through for the step-by-step installation instructions.

Comments closed

Azure VM Encryption

Published 2017-02-06 by Kevin Feasel

Melissa Coates looks at different encryption methods available for Azure Virtual Machines:

Initially I opted for Storage Service Encryption due to its sheer simplicity. This is done by enabling encryption when you initially provision the storage account. After having set it up, I had proceeded onto other configuration items, one of which is setting up backups via the Azure Recovery Vault. Turns out that encrypted backups in the Recovery Vault are not (yet?) supported for VMs encrypted with Storage Service Encryption (as of Feb 2017).

Next I decided to investigate Disk Encryption because it supports encrypted backups in the Recovery Vault. It’s more complex to set up because you need a Service Principal in AAD, as well as Azure Key Vault integration. (More details on that in my next post.)

Click through for a point-by-point comparison between the two methods.

Comments closed

Encryption In ElasticMapReduce

Published 2017-02-03 by Kevin Feasel

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

Data residing on Amazon S3—S3 client-side encryption with EMR

Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR
Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption
Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Comments closed

File Snapshot Backups

Published 2017-02-02 by Kevin Feasel

Raul Gonzalez digs into file snapshot backups in Azure:

One of the limitations for these ‘File Snapshot Backups’ (and probably the most important) is that all our databases files must be stored in the cloud, so we can take my previous post just as the preparation for what is coming now.

In order to move our files to the cloud we have different possibilities, one might be the typical approach where we’re allowed for some down time.

Check it out; you might want to give file snapshot backups a try.

Comments closed

SQL Server VMs In Google Compute Engine

Published 2017-02-02 by Kevin Feasel

Brent Ozar reports on Google cloud improvements:

Google Compute Engine is infrastructure-as-a-service (IaaS), selling virtual machines by the hour like Azure VMs and AWS EC2. You can run whatever you like in these VMs, and Google has long supported running SQL Server in GCE. You could build your own SQL Servers, or use pre-built (and licensed) instances of SQL Server 2012, 2014, or 2016 – but only Standard or Web Editions.

Today, GCE supports Enterprise Edition AND Always On Availability Groups.

We’ve got a white paper coming soon on how to build and test it, plus more cool stuff in the pipeline that DBAs will love.

We live in interesting times.

Comments closed

Interactive Queries From Azure

Published 2017-01-27 by Kevin Feasel

Arun Sirpal shows a query editor is now available with Azure SQL Database from the Azure portal:

This is in public preview and you can do the following:

Query dynamic management views for real-time workload insights (Which is what I will be doing).
Issue ad-hoc queries.
Manage your user authentication.

Read on for more information.

Comments closed

Category: Cloud

Using Sparklyr To Analyze Flight Data

Azure Price Cuts

Integrating Data Lake Storage With SQL Data Warehouse

Azure SQL Database Extended Events

Local Azure Data Lake

Azure VM Encryption

Encryption In ElasticMapReduce

Data at rest

Data in transit

File Snapshot Backups

SQL Server VMs In Google Compute Engine

Interactive Queries From Azure