Press "Enter" to skip to content

Category: Cloud

Power BI Dataflow Use Cases

Reza Rad gives us the value proposition behind Power BI Dataflow:

If you don’t have an account in Azure or you don’t have a subscription that you can use for Azure Data Lake, No need to worry! You can still use Dataflow. The whole process of storing data into Azure Data Lake is internally managed through Dataflow. You won’t even need to login to the Azure portal or anywhere else. From your point of view, in the Power BI website, you create a dataflow, and that dataflow manages the whole storage configuration. You won’t need to have any other accounts or pay anything extra or more than what you are paying for Power BI subscription.

Click through for use cases and some tips.

Comments closed

Cloud Risk: Service Obsolescence

Joy George Kunjikkur takes us through a risk scenario using an example of the Azure chat bot service:

Beginning of last year, we started to develop a chat bot demo. The idea was to integrate the chat bot into one of the big applications as a replacement to FAQ. Users can ask questions to bot thus avoiding obvious support tickets in the future.

Things went well. We got appreciation on the demo and started moving to production. About half way, things started turning south. The demo chat bot application used Bot SDK V3. It had voice recognition enabled which allow users to talk to it and get the response back in voice. During the demo we used Azure Bing Speech API. But later before the production, we got the notice that the service is obsolete and will be retired mid 2019. Another surprise was the introduction of Bot SDK V4 which is entirely different that Bot SDK V3. Something like AngularJS v/s Angular.

The major services tend to give you some time to switch over—in this case, they had 10 months to make a move. But when dealing with online services versus locally installed products, there’s always a risk that the service you’re calling won’t be there, and depending upon how critical that service is, it can have a major effect on your ability to function if it disappears one day. That’s definitely not a reason to ignore these services; it’s a reason to have a backup plan in place.

Comments closed

Azure Databricks And Active Directory

Tristan Robinson wraps up a two-parter on Azure Databricks security:

With the addition of Databricks runtime 5.1 which was released December 2018, comes the ability to use Azure AD credential pass-through. This is a huge step forward since there is no longer a need to control user permissions through Databricks Groups / Bash and then assigning these groups access to secrets to access Data Lake at runtime. As mentioned previously – with the lack of support for AAD within Databricks currently, ACL activities were done on an individual basis which was not ideal. By using this feature, you can now pass the authentication onto Data Lake, and as we know one of the advantages of Data Lake is the tight integration into Active Directory so this simplifies things. Its worth noting that this feature is currently in public preview but having tested it thoroughly, am happy with the implementation/limitations. The feature also requires a premium workspace and only works with high concurrency clusters – both of which you’d expect to use in this scenario.

It looks like this is the way to go forward with securing Azure Databricks. Read the whole thing.

Comments closed

Recreating Dropped Azure SQL Managed Instance DBs

Jovan Popovic has a script to re-create an Azure SQL Managed Instance database which you might accidentally have dropped:

Azure SQL Database – Managed Instance is fully-managed PaaS service that provides advanced disaster-recovery capabilities. Even if you accidentally drop the database or someone drops your database as part of security attack, Managed Instance will enable you to easily recover the dropped database.
Azure SQL Managed Instance performs automatic backups of you database every 5-10 minutes. If anything happens with your database and even if someone drops it, your data is not lost. Managed Instance enables you to easily re-create the dropped database from the automatic backups.

Click through for the Powershell script.

Comments closed

Azure Databricks Security

Tristan Robinson looks at what’s currently available in terms of security on Azure Databricks:

You’ll notice that as part of this I’m retrieving the secrets/GUIDS I need for the connection from somewhere else – namely the Databricks-backed secrets store. This avoids exposing those secrets in plain text in your notebook – again this would not be ideal. The secret access is then based on an ACL (access control list) so I can only connect to Data Lake if I’m granted access into the secrets. While it is also possible to connect Databricks up to the Azure Key Vault and use this for secrets store instead, when I tried to configure this I was denied based on permissions. After research I was unable to overcome the issue. This would be more ideal to use but unfortunately there is limited support currently and the fact the error message contained spelling mistakes suggests to me the functionality is not yet mature.

To be charitable, there appears to be room for implementation improvement.

Comments closed

Data Lake Organization Tips

Melissa Coates has some great advice for people working with data lakes:

Q: Partitioning by date is common. Where should the dates go in the folder hierarchy?

Almost always, you will want the dates to be at the end of the folder path. This is because we often need to set security at specific folder levels (such as by subject area), but we rarely set up security based on time elements.

Optimal for folder security: \SubjectArea\DataSource\YYYY\MM\DD\FileData_YYYY_MM_DD.csv

Tedious for folder security: \YYYY\MM\DD\SubjectArea\DataSource\FileData_YYYY_MM_DD.csv

Click through for all of Melissa’s advice in FAQ form.

Comments closed

CosmosDB Time To Live Support

Hasan Savran explains the Time To Live (TTL) counter in CosmosDB:

Another great feature of Cosmos DB is, TTL (Time To Live) support. This is a great option to have if you need a database system with Caching option, or you need to purge your data and you don’t want to develop a function to remove data from your dataset. CosmosDB’s TTL feature is pretty simple, all you need to do is, turn this feature on and declare when data should be removed from your dataset. Best part about TTL in CosmosDB is, CosmosDB does not charge you when it removes the data from your containers so you can use your Request Units for other transactions!

There are two ways to set Time to Live value. You can set the TTL value on a container or you can set it on a specific item by using CosmosDB SDK. TTL value must be in seconds.
 TTL timer resets if data gets modified for any reason.

Click through for an example of it in action.

Comments closed

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

Comments closed

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN:

The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the Github repositories. Updated versions of these packages will also be posted to CRAN, so you can get the latest versions simply by running update.packages.

Read on for a summary of those packages.

Comments closed

Data Transformation Tools In The Azure Space

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake.  And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL).  Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data.  There is good news and bad news when it comes to which product to use.  The good news is there are a lot of products to choose from.  The bad news is there are a lot of products to choose from :-).  I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Comments closed