Press "Enter" to skip to content

Category: Cloud

Azure SQL Data Warehouse Security

Grant Fritchey looks at what security measures are available within Azure SQL Data Warehouse:

Login Security

You have two core choices on logins. First, you have to create a SQL login at the server level for both Azure SQL Database and Azure SQL Data Warehouse. You can’t remove this or disable it (to my knowledge, and I’ve tried), so make the password a good one (and don’t lose it). You can then create other SQL logins, but this is not a recommended best practice. In fact, I wouldn’t do it at all unless I was forced because of some third party product (few of which currently support Azure anyway).

The next choice, the preferred choice, is to set up Azure Active Directory. With Azure AD you get all the functionality you’re used to with your local AD. Further, you can federate Azure AD with your local AD to control and manage the logins from within your network. You also get multi-factor authentication with Azure AD. We are talking real security here. Read through the documentation on setting up authentication to get it right. You can do the whole thing using Powershell commands, so there’s no excuse on automating it.

There aren’t as many security-related toggles as in an on-prem product, but Grant demonstrates what is available.

Comments closed

Power BI Premium

James Serra explains the Power BI Premium tier:

For costs, it allows an unlimited number of users since it is priced by aggregate capacity (see Power BI Premium calculator).  Users who need to create content in Power BI will still require a $10/month Power BI Pro seat, but there is no per-seat charge for consumption.

For scale, it runs on dedicated hardware giving capacity exclusively allocated to an organization for increased performance (no noisy neighbors).  Organizations can choose to apply their dedicated capacity broadly, or allocate it to assigned workspaces based on the number of users, workload needs or other factors—and scale up or down as requirements change.

They’re throttling down Power BI Free, making it really just for personal use, but I think the Premium tier will help with pricing for adoption.

Comments closed

Storm In .Net

Ravi Peri explains how to use Apache Storm in .NET code on HDInsight:

Topology submissions can fail due to many reasons:

  • JDK is not installed or is not in the Path
  • Required java dependencies are not included
  • Incompatible java jar dependencies. Example: Storm-eventhub-spouts-9.jar is incompatible with Storm 1.0.1. If you submit a jar with that dependency, topolopgy submission will fail.
  • Duplicate names for topologies

/var/log/hdinsight-scpwebapi/hdinsight-scpwebapi.out file on active headnode will contain the error details.

At one point, I was big on Storm and really wanted a .NET client for Storm to take off.  Nowadays, I’d rather use Spark Streaming or Kafka Streams for the same kind of streaming data work.

Comments closed

Twitter Campaign/Brand Management In Power BI

Mindy Curnutt looks at a Power BI solution template for managing Twitter campaigns:

Now you can start poking around and seeing what’s in the Dashboard. Since I opted to not put any handles in for analysis of FROM and TO, the first two tabs in the workbook (Outbound Tweets and Inbound Tweets) will not have any information, this is normal.

But then we get to tab #3 – Author Hashtag Graph.  The gray dots are hashtags and the green dots are accounts that have tweeted. You can see that I made a tweet that had 2 hashtags – #osmf2017 and #mvpbuzz. And boy was @TexasMusicDude busy tweeting up a storm – and using lots of other hashtags in conjunction with his tweets. Other hashtags that were popular appear to be #CampGround, #ShinyRibs, #TexasMusic, #DreamFolk and #Strings. Along the bottom you can see the day/timeline and the quantity of tweets at what time of day. If you click on any of the nodes, the information about what time the tweet(s) took place is highlighted in the timeline. It’s very interactive.

It does require an Azure subscription, but it looks very useful as a model for an advanced set of dashboards as well as a campaign management tool.

Comments closed

Missing JRE, Or Maybe C++

Meagan Longoria went through a frustrating scenario:

On a recent project I used Azure Data Factory (ADF) to retrieve data from an on premises SQL Server 2014 instance and land them in Azure Data Lake Store (ADLS) as ORC files. This required the use of the Data Management Gateway (DMG). Setup was quick and easy in our development environment. We installed the DMG for development on a separate server in the client’s network, where we also installed SQL Server Management Studio (SSMS) for query development and data validation. We set up resource groups in Azure for development and production, and made sure the settings for development and production were the same.  Then we set up a separate server for the production DMG.

Deployment and execution went well in the dev environment. Testing was completed, so we deployed to prod. Deployment went fine, but the pipelines failed execution and returned the following error on the output data sets.

Weird solution, but I’m going to guess that it makes perfect sense if you are able to look at the code.

Comments closed

SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Comments closed

Re-Provisioning Volumes In Azure

Robert Bishop shows how to create data disks in Azure and attach them to VMs:

Recently a client found this article on “Best Practices for SQL Server in Azure Virtual Machines” and wanted to re-provision his volumes to adhere to them.

No my first thoughts was wait, I’m a DBA, not a System Admin that’s not my role! But thinking more about it I realized the client views this as a SQL Server issue and I am the SQL Server Consultant and that it is my job to remedy this problem.

Not being 100% confident in Azure, I spun up a VM SQL Server and attempted to add some volumes.  To my surprise, this was way too easy.

Click through for the steps.

Comments closed

Securing Azure Storage

Christos Matskas has an article on securing Azure blobs and containers:

All communication with the Azure Storage via connection strings and BLOB URLs enforce the use of HTTPS, which provides Encryption in Transit. You can enforce the use of “Always HTTPS” by setting the connection string like this: “DefaultEndpointsProtocol=https;AccountName=myblob1…” or in SAS signatures, as in the example below:

https://myblob1.blob.core.windows.net/?sv=2015-04-05&ss=bfqt&srt=sco&sp=rwdlacup&se=2016-09-22T02:21:41Z&st=2016-09-21T18:21:41Z&spr=https&sig=hxInpKBYAxvwdI9kbBglbrgcl1EJjHqDRTF2lVGeSUU%3D

To protect data at rest, the service provides an option to encrypt the data as they are stored in the account. There’s no additional cost associated with encrypting the data at rest and it’s a good idea to switch it on as soon as the account is created. There is a one-click setting at the Storage Account level to enable it, and the encryption is applied on both new and existing storage accounts. The data is encrypted with AES 256 cipher and it’s now generally available to all Azure regions and Azure clouds (public, government etc)

There’s some good information here, making it worth the read.

Comments closed

Cloudera Accessing Azure Data Lake Store

The Azure Data Lake team has announced that you can now access Azure Data Lake Store using a Cloudera cluster:

The Azure Data Lake (ADL) vision from the beginning has been to transform business data into intelligence by providing analytics on any data at cloud scale. ADL enterprise customers gain insights on their business data using a wide range of tools and platforms. Today’s release of Cloudera Enterprise 5.11 brings another very valuable and widely-used Hadoop computation platform to the set of platforms that can leverage ADLS. No matter what big data analytics platform you choose, Azure Data Lake Store provides a single high throughput enterprise-scale hierarchical file system data lake repository for big data.

Anyone with an Azure subscription can now deploy Cloudera clusters with ADLS. To get started, you can use the Cloudera Enterprise Data Hub template or the Cloudera Director template on Azure Marketplace to create a Cloudera cluster. Once the cluster is up, see here for more information on how to set up your Cloudera cluster with ADLS today!

That’s an interesting development.

Comments closed

Spark Deep Learning On AWS

Joseph Spisak, et al, show how to configure and use BigDL in Amazon Web Services’s ElasticMapReduce:

Classify text using BigDL

In this tutorial, we demonstrate how to solve a text classification problem based on the example found here. This example uses a convolutional neural network to classify posts in the 20 Newsgroup dataset into 20 categories.

We’ve provided a companion Jupyter notebook example on GitHub that you can open in the Jupyter dashboard to execute the code sections.

There’s a lot to this tutorial.

Comments closed