Category: Cloud

Data Classification In Power BI

Published 2017-04-14 by Kevin Feasel

Steve Hughes describes how Power BI data classification works:

Power BI Privacy Levels “specify an isolation level that defines the degree that one data source will be isolated from other data sources”. After working through some testing scenarios and trying to discover the real impact to data security, I was unable to effectively show how this might have any bearing on data security in Power BI. During one test was I shown a warning about using data from a website with data I had marked Organizational and Private. In all cases, I was able to merge the data in the query and in the relationships with no warning or filtering. All of the documentation makes the same statement and most bloggers are restating what is found in the Power BI documentation as were not helpful. My takeaway after reviewing this for a significant amount of time is to not consider these settings when evaluating data security in Power BI. I welcome comments or additional references which actually demonstrate how this isolation actually works in practice. In most cases, we are using organizational data within our Power BI solutions and will not be impacted by this setting and my find improved performance when disabling it.

As Steve notes, this is not really a security feature. Instead, it’s intended to be more a warning to users about which data is confidential and which is publicly-sharable .

Comments closed

Using h2o.ai On HDInsight

Published 2017-04-13 by Kevin Feasel

Xiaoyong Zhu shows how to set up h2o.ai on Azure HDInsight:

H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots and rich media into a single document, much like Jupyter Notebooks. With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow allows you to use H2O interactively to import files, build models, and iteratively improve them. Based on your models, you can make predictions and add rich text to create vignettes of your work – all within Flow’s browser-based environment. In this blog, we will only focus on its visualization part.

H2O FLOW web service lives in the Spark driver and is routed through the HDInsight gateway, so it can only be accessed when the spark application/Notebook is running

You can click the available link in the Jupyter Notebook, or you can directly access this URL:

https://yourclustername-h2o.apps.azurehdinsight.net/flow/index.html

Setup is pretty easy.

Comments closed

SQL Server Backup To Azure Tool Causing Restore Errors

Published 2017-04-13 by Kevin Feasel

Jack Li diagnoses an issue in which the Microsoft SQL Server Backup to Microsoft Azure Tool causes errors when trying to restore a database on an Azure VM with SQL Server 2008 R2:

I worked on an interesting issue today where a user couldn’t restore a backup. Here is what this customer did:

backed up a database from an on-premises server (2008 R2)

copied the file to an Azure VM

tried to restore the backup on the Azure VM (2008 R2 with exact same build#)

But he got the following error:

Msg 3241, Level 16, State 0, Line 4
The media family on device ‘c:\temp\test.bak’ is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 4
RESTORE HEADERONLY is terminating abnormally.

We verified that he could restore the same backup on the local machine (on-premises). Initially I thought the file must have been corrupt during transferring. We used different method to transfer file and zipped the file. The behavior is the same. When we backed up a database from the same Azure VM and tried to restore, it was successful.

Click through for Jack’s findings as well as a couple workarounds.

Comments closed

On-Prem Power BI Gateway

Published 2017-04-13 by Kevin Feasel

Steve Hughes shows how to set up a data gateway for Power BI:

First, I will not be discussing the personal gateway in this post. If you have chosen to use the personal gateway, you have limited functionality and should consider using the on-premises data gateway for corporate use.

The on-premises data gateway (referred to as gateway throughout this post) “acts as a bridge, providing quick and secure data transfer between on-premises data and the Power BI, Microsoft Flow, Logic Apps, and PowerApps services.” (ref) Much of what is discussed here will apply to all of the services referenced above, but our primary concern is related to Power BI. Please refer to references at the end of this post for details about data sources supported within the gateway.

Click through for more information.

Comments closed

Thinking About Automation

Published 2017-04-12 by Kevin Feasel

Chrissy LeMaire has a series of thoughts on this month’s T-SQL Tuesday, and it was worth separating out from the rest of today’s batch:

Y’all know what I’m gonna say here! I love automation and PowerShell. I know for a fact that PowerShell and T-SQL together are the future of SQL Server administration. As someone who often presents about dbatools, the popular SQL PowerShell community project, I’ve seen the excitement and relief that PowerShell automation brings to SQL Server Database Pros.

From making it way easier to migrate entire instances to automating backup testing and verification, PowerShell makes it straight up more enjoyable to be a DBA.

There’s a lot of well-deserved plugging of dbatools. Hint, hint.

Comments closed

Moving Files In Azure Data Factory

Published 2017-04-12 by Kevin Feasel

Meagan Longoria has a workaround for how you cannot move a file using Azure Data Factory:

But at this time ADF doesn’t support that. You can copy a file with a copy activity, but you cannot actually move (i.e., copy and delete).

Luckily, we had a workaround for our situation. If you tell ADF to copy data to a file that already exists in the specified location in the data lake, it will overwrite the existing file. We made sure the file name is always the same for each table in the staging area so there is always only one file per table.

Read on for the full details on this workaround. Also, vote on this feedback item if you want the ability to move files instead of just copying them.

Comments closed

Azure Data Lake Store Best Practices

Published 2017-04-11 by Kevin Feasel

Ust Oldfield provides recommendations on how to size and lay out files in Azure Data Lake Store:

The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.

In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.

This is the type of thing that you can easily forget about, but it makes a huge difference down the line.

Comments closed

HDInsight 3.6 Available

Published 2017-04-10 by Kevin Feasel

Ashish Thapliyal points out some Hive improvements in HDInsight 3.6:

2 Create a new Hive table from scratch or alter Table

Create a new table by, clicking on the ‘+’ icon, which opens the create table wizard. Enter table name, column name and choose a data type from the dropdown. You can pick folloiwng advanced hive settings directly from the UI

Transactional : Turn on transaction support in Hive, by checking this flag. Note that the table must be bucketed and stored using an ACID compliant format (such as ORC).
Location : Hive stores the table data for managed tables in the Hive warehouse directory in HDFS which is configured in hive-site.xml with property hive.metastore.warehouse.dir. The default location is /apps/hive/warehouse. The location can be changed using the Location text field.
File Format : The default file format for CREATE TABLE statement is ORC. choose a format from the file format dropdown.
Row Format : Select a row format such as Field terminator, Lines terminator, and Stored File type.
Table can be altered to add new columns or change the column name or column datatype.
Tables can also be renames and altred

Read on for more improvements, including a graphical plan viewer and improved autocomplete.

Comments closed

Taking Advantage Of Azure Elasticity

Published 2017-04-05 by Kevin Feasel

Arun Sirpal migrated a number of Azure SQL Databases into an elastic pool and configured a series of elastic jobs to support them:

I want to show you how I went from having multiple single SQL databases in Azure to a database elastic pool within a new dedicated SQL Server. Once setup I create and use elastic jobs. This post is long but I am sure you will find it useful.

APPROACH TAKEN

Create a new “logical” SQL Server.
Create a new elastic pool within this logical SQL Server.
Move the data from the old single SQL databases to the above elastic pool (couple of ways to do this but I used built-in backups).
Confirm application connection.
Decommission single SQL databases.
Create / setup an elastic job.
Check the controller database.

Definitely worth reading if you are looking at hosting multiple databases in Azure.

Comments closed

Handling Runbook Alerts

Published 2017-04-05 by Kevin Feasel

Grant Fritchey shows how to set up alerting when an Azure automation job fails:

Believe it or not, there’s not an immediately obvious “Oh, you had an error in your Automation script, here’s how you alert someone” setting in the Azure portal. Now, you could simply put error handling in your PowerShell script. In fact, it’s probably not at all a bad idea to do that as well. However, what you would not get setting things up that way is a mechanism for managing the alerts, history, additional possible responses (like firing off another Runbook, although there is way to do that from the PowerShell too). Instead, what I want is way to manage alerts through the Azure fabric.

If you do a search, there is an Azure Alert service. However, it didn’t seem to be really what I was looking for. Further, I found it extremely difficult (OK, I couldn’t make it work) to connect the alerts directly to the Jobs related to my Runbooks. Instead, after quite a bit of research, what I found is a combination of Azure Log Analytics with the Operations Management Suite (OMS) will do exactly what I’m looking for.

Click through to read how to set this up.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31