Press "Enter" to skip to content

Category: Cloud

Local Azure Data Lake

Julie Koesmarno shows how to set up Azure Data Lake for local testing:

Late last year, I presented a Cognitive Intelligence demo using Azure Data Lake (ADL) at PASS Summit keynote. It was a fun and quick demo! Watch it here :)

In case you’re new to ADL, you can now (since Dec 2015) develop, compile and run ADL locally in Visual Studio. This is huge! Because you don’t have to worry about your ADL Analytics Unit (AU) consumptions. Plus, this allows you to try it before you buy it too!

Click through for the step-by-step installation instructions.

Comments closed

Azure VM Encryption

Melissa Coates looks at different encryption methods available for Azure Virtual Machines:

Initially I opted for Storage Service Encryption due to its sheer simplicity. This is done by enabling encryption when you initially provision the storage account. After having set it up, I had proceeded onto other configuration items, one of which is setting up backups via the Azure Recovery Vault. Turns out that encrypted backups in the Recovery Vault are not (yet?) supported for VMs encrypted with Storage Service Encryption (as of Feb 2017).

Next I decided to investigate Disk Encryption because it supports encrypted backups in the Recovery Vault. It’s more complex to set up because you need a Service Principal in AAD, as well as Azure Key Vault integration. (More details on that in my next post.)

Click through for a point-by-point comparison between the two methods.

Comments closed

Encryption In ElasticMapReduce

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR

  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption

  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Comments closed

File Snapshot Backups

Raul Gonzalez digs into file snapshot backups in Azure:

One of the limitations for these ‘File Snapshot Backups’ (and probably the most important) is that all our databases files must be stored in the cloud, so we can take my previous post just as the preparation for what is coming now.

In order to move our files to the cloud we have different possibilities, one might be the typical approach where we’re allowed for some down time.

Check it out; you might want to give file snapshot backups a try.

Comments closed

SQL Server VMs In Google Compute Engine

Brent Ozar reports on Google cloud improvements:

Google Compute Engine is infrastructure-as-a-service (IaaS), selling virtual machines by the hour like Azure VMs and AWS EC2. You can run whatever you like in these VMs, and Google has long supported running SQL Server in GCE. You could build your own SQL Servers, or use pre-built (and licensed) instances of SQL Server 2012, 2014, or 2016 – but only Standard or Web Editions.

Today, GCE supports Enterprise Edition AND Always On Availability Groups.

We’ve got a white paper coming soon on how to build and test it, plus more cool stuff in the pipeline that DBAs will love.

We live in interesting times.

Comments closed

Sampling Data Lake Data

Alex Whittles shows how to use U-SQL to sample data to read in Power BI:

The answer is sampling, we don’t bring in 100% of the data, but maybe 10%, or 1%, or even 0.01%, it depends how much you need to reduce your dataset. It is however critical to know how to sample data correctly in order to maintain a level of accuracy of data in your reports.

Option 1: Take the top x rows of data
Don’t do it. Ever. Just no.
What if the source data you’ve been given is pre-sorted by product or region, you’d end up with only data from products starting with ‘a’, which would give you some wildly unpredictable results.

Option 2: Take a random % sample
Now we’re talking. This option will take, for example 1 in every 100 rows of data, so it’s picking up an even distribution of data throughout the dataset. This seems a much better option, so how do we do it?

Read on for a couple of sampling methods.

Comments closed

Backing Up To Azure Storage

Neil Gelder shows how to back up directly to Azure blob storage:

The URL is the one from the container we made a note of and the credential is the one we created in the last step.

Now if we return to the container screen in the Azure Console and refresh the screen you’ll see your backup file like below

My personal preference here would be to back up locally and then have a job migrate backups to Azure or S3.  That storage is 1-3 cents per GB per month (and even cheaper if you’re willing to store the data in Glacier), so for more small to mid-sized organizations running databases in the tens of gigs, it’s a great way of getting around only being able to store a week or two worth of backups on-site.

Comments closed

Beginning With Amazon Athena

Jen Underwood looks at the basics behind Amazon Athena:

Today early adopters of Amazon Athena are using it for big data analytics pipeline projects along with Kinesis streaming data and other Amazon data sources.

Athena is serverless parallel query pay-per-use service. There is no infrastructure to set up or manage. It scales automatically and can handle large datasets or complex distributed queries.

The easy way of thinking about Athena is that it’s ElasticMapReduce (a pay-as-you-go Hadoop cluster) without the ceremony of administering or spinning up the cluster.

Comments closed