Category: Cloud

Self-Analysis Of SQL Server Dump Files

Published 2017-06-28 by Kevin Feasel

Arun Sirpal walks through the SQL Server Diagnostics preview:

Notice the region to upload – If you are using a work machine I would suggest getting authorisation. The great thing here is that this is GDPR compliant.

Once ready hit the upload button, it goes through 3 phases. Upload, Analysis and a recommendation.

It sends your dump files to an external service, which is important enough to point out. If you want more details on the product, Rony Chatterjee has a FAQ.

Comments closed

Using AWS Database Migration Service

Published 2017-06-27 by Kevin Feasel

Derik Hammer shows how to move a SQL Server instance into AWS Relational Database Services (RDS):

Like most wizards in the world, this one begins with a welcome page. One important piece of information that is provided, however, is the tip about using the AWS Schema Conversion Tool. The conversion tool is most useful for heterogeneous migrations, such as Oracle to RDS Microsoft SQL Server. This demonstration is using homogeneous data platforms; therefore, it is not needed.

It does seem pretty easy to do.

Comments closed

Automatic Processing Of Azure Analysis Services Models

Published 2017-06-22 by Kevin Feasel

Dustin Ryan shows how to use Azure Functions to refresh Azure Analysis Services models:

Download the latest client libraries for Analysis Services. This needs to be done on your local machine so you can then copy these files to your Azure Function App.

After you’ve downloaded the client libraries, the DLLs can be found in C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies. The two files you need are:

C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Core.DLL
C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.AnalysisServices.Tabular.DLL

This step is important because the documentation in Azure references the 130 assemblies, which will not work. You need the assemblies in 140 otherwise you’ll get errors.

Dustin walks through the whole process of setting up an Azure Function step by step.

Comments closed

Scripting Azure Resources With ARM Templates

Published 2017-06-19 by Kevin Feasel

Melissa Coates has a detailed explanation of how to script out the creation and configuration of services in Azure using Azure Resource Manager (ARM) templates:

In many cases, you can easily provision resources in the web-based Azure portal. If you’re never going to repeat the deployment process, then by all means use the interface in the Azure portal. It doesn’t always make sense to invest the time in automated deployments. However, ARM templates are really helpful if you’re interested in achieving repeatability, improving accuracy, achieving consistency between environments, and reducing manual effort.

Use ARM templates if you intend to:

Include the configuration of Azure resources in source control (“Infrastructure as Code”), and/or
Repeat the deployment process numerous times, and/or
Automate deployments, and/or
Employ continuous integration techniques, and/or
Utilize DevOps principles and practices, and/or
Repeatedly utilize testing infrastructure then de-provision it when finished

Melissa walks through an example of deploying a website with backing database, along with various configuration changes.

Comments closed

Managing Data Lake Analytics Compute

Published 2017-06-12 by Kevin Feasel

Yan Li has a three-part series looking at management of Azure Data Lake compute. First, an overview:

Scenario 2: Set One Specific Group to Different Limits

New members are joining and sharing the same ADLA account. To prevent any new members, who are just learning ADLA, from mistakenly submitting a job that consumes too much compute resource (increasing cost and blocking other jobs), customers want to set the maximum AU per job for new employees at 30 AUs while others can submit jobs with up to 100 AUs.

Default Policy:

Job AU limit: 100

Priority limit: 1

Exception Policy: New Employee Policy

Job AU limit: 30
Priority limit: 200
Group: New Employee Group

Next up is a look at job-level policies:

With job-level policies, you can control the maximum AUs and the maximum priority that individual users (or members of security groups) can set on the jobs that they submit. This allows you to not only control the costs incurred by your users but also control the impact they might have on high priority production jobs running in the same ADLA account.

There are two parts to a job level policy:

Default Policy: This is the policy that is applied to all users of the service.

Exceptions: The set of “exception” policies apply to specific users.

Submitted jobs that do not violate the job-level policies are still subject to the account level policies as described in Azure Data Lake Analytics Account Level Policy.

Finally, account-level policies:

ADLA supports three types of account-level policies:

Maximum AUs — Controls the maximum number of AUs that can be used by running jobs
Maximum Number of Running Jobs — Controls the maximum number of concurrently running jobs.
Days to Retain Job Queries — Controls how long detailed information about jobs are retained in the users ADLS account.

There’s a good amount of information here.

Comments closed

Running DoAzureParallel On The Cheap

Published 2017-06-09 by Kevin Feasel

David Smith reports an update on the doAzureParallel R package:

At the EARL conference in San Francisco this week, JS Tan from Microsoft gave an update (PDF slides here) on the doAzureParallel package . As we’ve noted here before, this package allows you to easily distribute parallel R computations to an Azure cluster. The package was recently updated to support using automatically-scaling Azure Batch clusters with low-priority nodes, which can be used at a discount of up to 80% compared to the price of regular high-availability VMs.

JS Tan using doAzureParallel #rstats package to run simulation on a cluster of 20 low-priority Azure VMs. Total cost: $0.02 #EARLConf2017 pic.twitter.com/Mpl3IUa9zY

— David Smith (@revodavid) June 7, 2017

That lowers the barrier to usage significantly, so it’s a very welcome update.

Comments closed

Azure Private Virtual Networks

Published 2017-06-08 by Kevin Feasel

The Tech Junkie shows how to create a private virtual network in Azure:

In the previous blog post we created an Azure cloud service. Now we are going to create a private virtual Azure network. The importance of this is that when you create a virtual machine in Azure you will use this virtual network to connect to your virtual machine.

This is a screenshot-driven, step-by-step post that makes setting these up easy.

Comments closed

S3 And HDFS Data Migration

Published 2017-06-06 by Kevin Feasel

Ilya Yalovyy looks at S3DistCp, which allows you efficiently to migrate data back and forth between HDFS and S3:

Raw files often land in S3 or HDFS in an uncompressed text format. This format is suboptimal both for the cost of storage and for running analytics on that data. S3DistCp can help you efficiently store data and compress files on the fly with the --outputCodec option:
$ s3-dist-cp --src s3://my-tables/incoming/hourly_table_filtered --dest s3://my-tables/incoming/hourly_table_gz --outputCodec=gz
The current version of S3DistCp supports the codecs gzip, gz, lzo, lzop, and snappy, and the keywords none and keep (the default). These keywords have the following meaning:

“none” – Save files uncompressed. If the files are compressed, then S3DistCp decompresses them.
“keep” – Don’t change the compression of the files but copy them as-is.

This is an important article if you’ve got a Hadoop cluster running on EC2 nodes.

Comments closed

Jupyter And Kubernetes

Published 2017-06-05 by Kevin Feasel

David Crook shows how to use Jupyter notebooks inside Kubernetes:

We start with a 16.04 image, we run some upgrades, install python, upgrade pip, install our requirements and expose port 8888 (jupyter’s default port).

Here is our requirements.txt file

1

2

3

4

5

6

7

8

9

numpy

pandas

scipy

jupyter

azure_common

azure-storage

scikit-learn

nltk

plotly

Notice how Jupyter is in there, I also added a few other things that I very commonly use including numpy, pandas, plotly, scikit-learn and some azure stuff.

The big benefit to doing this is that your installation of Jupyter can exist independently from your notebooks, so if you accidentally mess up Jupyter, you kill and reload from the image in a couple commands.

Comments closed

Using Hive As A Power BI Data Source

Published 2017-06-05 by Kevin Feasel

Ust Oldfield shows how to use Hive via Azure HDInsight as a data source for Power BI:

As Hive is part of the Azure HDInsight stack it would be tempting to select the HDInsight or Hadoop connector when you’re getting data. However, note HDFS in brackets beside the Azure HDInsight and Hadoop File options as this means that you’ll be connecting to the underlying data store, which can be Azure Data Lake Store or Azure Blob Storage – both of which use HDFS architectures.

But this doesn’t help when you want to access a Hive table. In order to access a Hive table you will first of all need to install the Hive ODBC driver from Microsoft. Once you’ve downloaded and installed the driver you’ll be able to make your connection to Hive using the ODBC connector in PowerBI.

Read the whole thing. Connecting to Hive is pretty easy.

Comments closed