Press "Enter" to skip to content

Category: Cloud

The Cloud DBA

Kendra Little thinks about the evolution of the DBA role:

Lots of things have been reported to kill the DBA over the years

SQL Server 2005 was said to be “self-tuning”! Who needs a DBA when the instance tunes itself? (Apparently everyone.)

Outsourcing: All the DBA jobs are going to X location, then Y location, then Z location. Then back to X. DBA jobs have become more global, but “outsourcing” hasn’t gotten rid of DBA jobs in the United States. It has been part of the trend to make working remotely more normal and easy, which is generally good for DBAs.

DevOps! All the developers will manage everything. And somehow know to do so.  I love Dev Ops, and I have seen it wipe out some QA departments, but I haven’t seen it wipe out DBAs. I think it’s fun to be a DBA working with a Dev Ops team.

Consider this in contrast to Dave Mason’s concern.  My perspective is a lot closer to Kendra’s, but both posts make the good point that IT roles are ever-shifting.

Comments closed

Analyzing Flight Data With Sparklyr

Aki Ariga continues his sparklyr series with some analysis of US flight data:

In this post, we will show you a visualization and build a predictive model of US flights with sparklyr. Flight visualization code is based on this article.

This post assumes you already have the following tables:

You should make these tables available through Apache Hive or Apache Impala (incubating) with Hue.

There’s some setup work to get this going, but getting a handle on sparklyr looks to be a good idea if you’re in the analytics space.

Comments closed

Azure Container Service Supports Kubernetes

Serdar Yegulalp reports that Azure Container Service now supports the Kubernetes container management system:

Microsoft emphasized “choice” when it originally introduced Azure Container Service. Although it launched without Kubernetes, Azure initially supported Mesosphere DC/OS and Docker Swarm because the majority of Microsoft’s customers used them and the company believed they would be well served by the support.

Since then, Kubernetes has emerged as a clear leader among container orchestration solutions. It is used as an underpinning for deep learning frameworks and the basis for an open source serverless/“lambda” app framework, as well as offered as a managed on-premise service by one company.

Kubernetes on Azure is strictly focused on running Kubernetes within Azure, not providing it as a service elsewhere. But the GA release includes additions meant to appeal to a broad audience of both Linux and Windows Server users, such as support for the latest version of DC/OS (1.8.8).

It’s an interesting world out there.

Comments closed

Getting Started With Azure Cognitive Services

Rolf Tesmer has a demo app showing what Azure Cognitive Services Text Analytics can do:

Each execution of the application on any input file will generate 3 text output files with the results of the assessment.  The application runs at a rate of about 1-2 calls per second (the max send rate cannot exceed 100/min as this is the API limit).

  • File 1 [AzureTextAPI_SentimentText_YYYYMMDDHHMMSS.txt] – the sentiment score between 0 and 1 for each individual line in the Source Text File.  The entire line in the file is graded as a single data point.  0 is negative, 1 is positive.

  • File 2 [AzureTextAPI_SentenceText_YYYYMMDDHHMMSS.txt] – if the “Split Document into Sentences” option was selected then this contains each individual sentence in each individual line with the sentiment score of that sentence between 0 and 1.  0 is negative, 1 is positive.  RegEx is used to split the line into sentences.

  • File 3 [AzureTextAPI_KeyPhrasesText_YYYYMMDDHHMMSS.txt] – the key phrases identified within the text on each individual line in the Source Text File.

Rolf has also put his code on GitHub, so read on and check out his repo.

Comments closed

Using Azure Data Lake Store With Hadoop

Amit Kulkarni shows how to make Azure Data Lake Store the default file system for a Hadoop cluster:

So to give a concrete example, if the default file system was hdfs://123.23.12.4344:9000 then the /user/filename.txt would resolve to hdfs://123.23.12.4344:9000/user/filename.txt.

Why does the default file system matter? The first answer to this is purely convenience. It is a heck lot easier to simply say /events/sensor1/ than adl://amitadls.azuredatalakestore.net/ in code and configurations. Secondly, many components in Hadoop use relative paths by default. For instance there are a fixed set of places, specified by relative paths, where various applications generate their log files. Finally, many ISV applications running on Hadoop specify important locations by relative paths.

Read on to see how.

Comments closed

Hadoop In The Cloud

Peter Coates talks about pros and cons to Hadoop in the cloud:

Hadoop was developed for deployment over Linux running on bare metal. Cloud deployment implies virtual machines, and for Hadoop it’s a huge difference.

As detailed in other articles (for instance, Your Cluster Is an Appliance or Understanding Hadoop Hardware Requirements), bare-metal deployments have an inherent advantage over virtual machine deployments. The biggest of these is that they can use direct attached storage, i.e., local disks.

Not every Hadoop workload is storage I/O bound, but most are, and even when Hadoop seems to be CPU bound, much of the CPU activity is often either directly in service of I/O, i.e., marshaling, unmarshaling, compression, etc., or in service of avoiding I/O, i.e., building in-memory tables for map-side joins.

Read the whole thing.

Comments closed

Managing Azure SQL Database Firewall Rules

Cedric Charlier shows how to manage Azure SQL Database firewall rules from within Management Studio:

When you create a new Azure database, you usually need to open the firewall to remotely administrate or query this database with SSMS. An option is to create rules from the Azure Portal. It’s surely a convenient way to do it when you create a database but I prefer to keep a minimum of tools and when the Azure portal is not open, I prefer to not have to open it just to define a few firewall rules.

Opening the firewall with SSMS is a kind of chicken and eggs problem: to connect to your database/server, you need to open the firewall. Hopefully, SSMS has a great suite of screens to call the underlying API of Azure Portal and open the firewall for the computer running SSMS.

Cedric shows off sp_delete_firewall_rule but there’s also a corresponding sp_set_firewall_rule.

Comments closed

Bulk Loading HDInsight Using Phoenix

Anunay Tiwari uses Phoenix to bulk load data into HBase on HDInsight:

Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It provides real-time read/write access to large datasets. HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. HBase provides many features as a big data store. But in order to use HBase, the customers have to first load their data into HBase.

There are multiple ways to get data into HBase such as – using client API’s, Map Reduce job with TableOutputFormat or inputting the data manually on HBase shell. Many customers are interested in using Apache Phoenix – a SQL layer over HBase for its ease of use. The current post describes about how to use phoenix bulk load with HDinsight clusters.

Phoenix provides two methods for loading CSV data into Phoenix tables – a single-threaded client loading tool via the psql command, and a MapReduce-based bulk load tool.

Anunay explains both methods, allowing you to choose based on your data needs.

Comments closed

Azure Disk Encryption

Melissa Coates configures Azure Disk Encryption for an already-existing Azure VM:

As I discussed in my previous blog post, I opted to use Azure Disk Encryption for my virtual machines in Azure, rather than Storage Service Encryption. Azure Disk Encryption utilizes Bitlocker inside of the VM. Enabling Azure Disk Encryption involves these Azure services:

  • Azure Active Directory for a service principal
  • Azure Key Vault for a KEK (key encryption key) which wraps around the BEK (bitlocker encryption key)
  • Azure Virtual Machine (IaaS)

Following are 4 scripts which configures encryption for an existing VM. I initially had it all as one single script, but I purposely separated them. Now that they are modular, if you already have a Service Principal and/or a Key Vault, you can skip those steps. I have my ‘real’ version of these scripts stored in an ARM Visual Studio project (same logic, just with actual names for the Azure services). These PowerShell templates go along with other ARM templates to serve as source control for our Azure infrastructure.

The Powershell scripts are straightforward and clear, so check them out.

Comments closed