Press "Enter" to skip to content

Category: Cloud

EMR Studio Now Generally Availabile

Shuang Li announces that Amazon EMR Studio is now in GA:

EMR Studio provides fully managed Jupyter notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging. EMR Studio uses AWS Single Sign-On and allows you to log in directly with your corporate credentials without signing in to the AWS Management Console. You can install custom kernels and libraries, collaborate with peers using code repositories such as GitHub and Bitbucket, and run parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow and Amazon Managed Workflows for Apache Airflow (Amazon MWAA).

With EMR Studio, you can run notebook code on Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), and take advantage of the performance-optimized EMR runtime for Apache Spark. You can set up EMR Studio to run applications on existing EMR clusters or create new clusters using Cloud Formation templates for Amazon EMR.

Click through for more information.

Comments closed

JSON Patching in Cosmos DB

Hasan Savran does some JSON modification:

JSON is a quite common format for data in these days. NoSQL databases save data in JSON format. Even Relational Databases like SQL Server give you option to save or retrieve data in JSON format. When we need to update a property in a JSON document, we need to update the whole JSON document. This can be a problem specially if document size is larger. This is where JSON Patch feature comes in. JSON Patch is a format that let you update JSON document partially. 

     You can do more than updating a property. You can add, remove, replace, move, or test a value by using this format. Let’s look at its details by using the following JSON document as the source.

Read on to see how patching works with Cosmos DB in particular.

Comments closed

Extended Events in Azure SQL Database

Grant Fritchey compares and contrasts extended events on-premises to extended events in Azure SQL Database:

I have long advocated for the use of Extended Events. I’ve been posting all sorts of blog posts on how to implement them, how they present unique opportunities for new and interesting data, and how they do so much more than the old trace events, yet, put less of a load on the system. All of that is true, until we hit Azure SQL Database.

Now, don’t get me wrong, Extended Events are still awesome, amazing and wonderful. It’s just that, Azure SQL Database is going to force us to hop through a few hoops. I want to be up front with these… I’m trying to find the right word here, challenges? Maybe. Frustrations? Yeah, kind of. Limitations? Again, sort of, but not quite. We’ll settle on as neutral a term as possible: differences. For the moment.

Read on for the first part in an ongoing series.

Comments closed

Querying Serverless SQL Pools from Spark Notebooks in Scala

Jovan Popovic shows off one integration point between the data services in Azure Synapse Analytics:

Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data.

You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.

In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool.

Read on to see how. You can also query Spark pool and dedicated SQL pool tables from serverless SQL pools.

4 Comments

Automating Workflows in Azure DevOps with Logic Apps

Elie Bou Issa does some no-code automation:

Azure Logic Apps is a cloud service to help you schedule, automate, and orchestrate tasks and workflows between apps and across enterprises and organizations. A Logic App can be built using the Azure portal, or infrastructure as code.

By the end of this article, you will have a good understanding of leveraging a Logic App for Azure DevOps to automate the create of work items, in addition to creating an automated approval-based workflow using Office 365.

Click through for the demo. This is useful on its own, especially with non-technical product managers, but you can extend the use of Logic Apps quite a bit and automate more work without writing much code.

Comments closed

Updating or Creating a Measure for Power BI PPU without Full Publication

Gilbert Quevauvilliers follows up:

Following on from my previous blog post How to complete granular deployment of Power BI Desktop changes to the Power BI Service (Using PPU), I want to also show how to update or create a measure in my dataset, where I can deploy this via ALM Toolkit.

This now saves me from doing the following tasks previously:

– Time taken to refresh the PBIX file so that the data is up to date.
– Re-uploading my PBIX.
– If configured re-creating the incremental refreshing
– Time and effort to upload and wait for dataset refresh.
– Quick updates to my dataset.

I will not have to worry about saving my PBIX file, file and if configured re-creating the incremental refreshing. This saves me a lot of time and effort.

Read on to see how.

Comments closed

Using Notebooks in Azure Machine Learning Studio

Lina Kovacheva takes us through the process of working with notebooks in Azure Machine Learning Studio:

I discovered Jupiter notebooks not that long ago, but the more I use them the more I see how powerful they could be. For those of you who are not familiar whit Jupiter Notebook: It is an open-source web application where you can combine code, output, visualizations and explanatory text all in one document allowing you to write a code that tells a story. Now that you have an idea of what Jupiter notebook is I will walk you through how you can use it in Azure Machine Learning Studio.

Click through for the process. One advantage to notebooks in an environment like Azure ML over Azure Data Studio is that you have a much wider variety of languages, although Azure Data Studio has a SQL Server kernel, which other platforms currently do not have.

Comments closed

HammerDB CLI for Oracle Running on Azure

Kellyn Pot’vin-Gorman goes through a rough experience:

Disclaimer: I’m not a big fan of benchmark data.  I find it doesn’t provide us as much value in the real world as we’d like to think it does.  As Cary Milsap says, “You can’t hardware your way out of a software problem” and I find that many folks think that if they just get the fastest hardware, their software problems will go away and this just isn’t true.  Sooner or later, it’s going to catch up with you-  and it rarely tells you what your real database workload needs to run most efficiently or what might be running in your database that could easily be optimized to save the business time and money.

The second issue is that when comparing different workloads or even worse, different platforms or applications, using the same configuration can be detrimental to the benchmarks collected, which is what we’ll discover in this post.

That said, Kellyn dives into the problem and documents several of the issues in building out this test.

Comments closed

Deploying an Azure Arc Enabled Data Services Controller

Chris Adkin continues a series:

If you have been following this series, you should have:

– a basic understanding of Terraform
– a Kubernetes cluster that you can connect to using kubectl
– a basic understanding of Kubernetes services
– a working metalLB load balancer
– a basic understanding of how storage works in the world of Kubernetes
– a Kubernetes storage solution in the form of PX Store, alternatively you can use any solution (for the purposes of this series) which supports persistent volumes, however to use the backup solution in part 9 of the series you will need to use something that supports CSI

From here, Chris explains the importance of the data controller and then deploys one.

Comments closed

Auto-Failover Groups and Grace Periods

Taiob Ali clears up some misunderstanding:

The auto-failover groups feature for the Azure SQL database can be configured with an automatic failover policy. Azure triggers failover after the failure is detected and the grace period has expired. Grace period is determined by a setting called ‘GracePeriodWithDataLossHours’ that cannot be set under one hour. Why is it not allowed to set a time which is less than an hour? Can your business tolerate the application be down for that period? Should your turn off Auto Fail-over and set it to manual?

I noticed a lot of confusion around this setting, including my own. Some of the confusion is due to a lack of clarity in the documentation. I checked with the Microsoft Azure SQL team, and they are actively working on clarifying some of the questions I raised.

I want to thank Dimitri Furman and Roberto Bustos from the Azure SQL Team for clarifying some of my confusion that I will share here.

Read on for a Q&A style explanation of auto-failover and grace periods.

Comments closed