Press "Enter" to skip to content

Category: Cloud

Automatic Backups on a Data Lake or Lakehouse

Dave Ruijter backs that thing up:

Out of the box, Azure Data Lake Storage Gen2 provides redundant storage. Therefore, the data in your Data Lake(house) is resilient to transient hardware failures within a datacenter through automated replicas. This ensures durability and high availability. In this blog post, I provide a backup strategy on how to further protect your data from accidental deletions, data corruption, or any other data failures. This strategy works for Data Lake as well as Data Lakehouse implementations. It uses native Azure services, no additional tools, software, or licenses are required.

Read on for a detailed strategy.

Comments closed

Optimizing Blob Storage Query Performance

Dennes Torres compares several strategies for querying data stored in Azure Blob Storage:

In the third part of the series Querying Blob Storage with SQL, I will focus on the performance behaviour of queries: What makes them faster, slower, and some syntax beyond the basics.

The performance tests in this article are repeated, and the best time of the queries is recorded. This doesn’t mean you will always achieve the same timing. Many architectural details will affect the timing, such as cache, first execution, and so on. The timing exposed on each query is only a reference pointing to the differences of the query methods that can affect the time and the usual result for better or worse performance.

Click through to see which patterns perform well and which don’t.

Comments closed

Scaling an Azure SQL Managed Instance

Arun Sirpal wants more power:

No doubt there will be times where you need to scale up the actual instance in terms of vCores but also you may want to move across tiers (for example General Purpose to Business Critical). If you remember a few blog posts ago I said it was really important to plan for these activities during the build phase, more specifically get the subnet range right. If you done that then you will be fine.

Click through for the process, though do note the amount of time it takes. One of the early ideals of cloud processing would be that you could seamlessly scale up and down with no effects on the end user. In some services (especially things like function apps, web apps, and VMs in a Kubernetes pod), you get that experience. When it comes to almost anything data-related, though, immediate scaling is a hard no, to the point where I’d assume you can’t afford the downtime to do it until proven otherwise.

Comments closed

Contrasting Kafka with Azure Service Bus

Ritam Das explains the differences between Apache Kafka and Azure Service Bus:

 It is important to note that Azure Service Bus is a traditional message broker and tailored to somewhat different use cases when compared to Kafka. Simply transferring between these two technologies is not an easy feat and would require overhauling your entire application. The comparison stops at both technologies being message brokers as under the hood they are fundamentally different. 

At a high level, ASB has high processing overhead per message, stronger guarantees around delivery and processing, and typically a “process once” model. Kafka has low overhead processing per message, fewer guarantees around delivery and processing, and typically a “publish once, process multiple times” model. To provide an explicit comparison, it would be best to understand the intended use case and proceed from there. 

Read on to understand the best uses for each technology, as well as sample calls using Python.

Comments closed

Creating Delta Lake Tables in Azure Databricks

Gauri Mahajan takes us through creating new tables in a Delta Lake using Azure Databricks:

Delta lake is an open-source data format that provides ACID transactions, data reliability, query performance, data caching and indexing, and many other benefits. Delta lake can be thought of as an extension of existing data lakes and can be configured per the data requirements. Azure Databricks has a delta engine as one of the core components that facilitates delta lake format for data engineering and performance. Delta lake format is used to create modern data lake or lakehouse architectures. It is also used to build a combined streaming and batch architecture popularly known as lambda architecture.

Click through for the process.

Comments closed

Performing a Restore to SQL Managed Instance

Arun Sirpal shows us how to perform a backup and restoration from an on-premises SQL Server to Azure SQL Managed Instance:

So in the last blog we confirmed that we could move to SQL MI via some analysis, this is now time to actually do a backup and restore via URLs to move data.

Quite simply you need to BACKUP to URL (Azure Storage container) and the setup requirement is that you need to create a SQL credential that holds the SAS token – this is what allows authentication to the container to take place. 

Click through for the process.

Comments closed

Time Series Insights in Azure

Aveek Das explains the notion of Azure Time Series Insights:

In this article, we are going to learn in detail about Azure Time Series Insights. Microsoft Azure is one of the leading cloud providers these days. With a lot of companies adopting or migrating to the cloud these days, it has become a usual trend to convert existing technologies into cloud-based services and consume them. This not only helps the companies to reduce their cost but also in turn allows them to focus on more business-related problems rather than concentrating on infrastructure costs.

Azure Time Series Insights is one of the cloud services that users can use to integrate with their data that is constantly changing with time such as data from various sensors or machines, data from satellites, airlines etc. Any data that can be generated on a high scale and needs to be analysed, can be used through Azure Time Series Insights. In this article, we will focus on a high-level introduction of this service along with some use cases in detail.

Read on for the article.

Comments closed

Defect Detection with AWS Lookout and Sagemaker

Matthew Rhodes, et al, take us through an interesting case study:

According to a recent study, defective products cost industries over $2 billion from 2012–2017. Defect detection within manufacturing is an important business use case, especially in high-value product industries like the automotive industry. This allows for early diagnosis of anomalies to improve production line efficacy and product quality, and saves capital costs. Although advanced anomaly detection systems employ sensors as well as Internet of Things (IoT) devices to collect multimodal data to improve performance, computer vision continues to be a common approach. Detecting anomalies in automotive parts and components using computer vision can be done using normal images, and even X-Ray based images for structural damages. Recent advances in deep learning and computer vision have allowed scientists and manufacturers to develop enhanced anomaly detection systems, including surface defect detection on automotive body panels and dent detection in vehicles.

Read on for case notes.

Comments closed