Press "Enter" to skip to content

Category: Cloud

An Intro to Key Word Analysis

Lewis Prince continues a series on natural language processing:

Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA).

Read on for a high-level overview of the topic and how to do it in Cognitive Services. But not the topic model—that’d be a different post.

Comments closed

Resolving tempdb Issues in Azure SQL DB

Holger Linke troubleshoots some problems:

The tempdb system database is a global resource available to users who are connected to Azure SQL Database or any instance of SQL Server. It holds temporary user objects that are explicitly created by a user or application, and internal objects that are created by the SQL Server database engine itself. The most common tempdb issue is running out of space, either regarding tempdb’s overall size quota or the transaction log.

The available tempdb space in Azure SQL Database depends on two factors: the service tier (pricing tier) that the database is configured with, and the type of workload that is executed against the database. These are also the main factors to control if you are running out of tempdb space.

Click through for several error cases and how we can resolve them.

Comments closed

Azure Functions and Azure Database Options

Sarah Dutkiewicz continues a series on learning Azure. First up, Azure Functions:

Azure Functions are not something you’ll see rendered on a front-end somewhere. They’re a serverless solution used for doing things in the back-end and the middle tier. 

After that, Sarah touches on database options:

There are many databases on Azure – including relational data in Azure SQL, NoSQL with Azure Cosmos DB, and even some popular databases in the open source realm such as MySQL and PostgreSQL. These are just a few of the data stores available. Check this page of Azure Databases for a matrix of the databases available compared by their features.

Click through for quite a few links and information on when to use what.

Comments closed

Serverless Compute for Databricks SQL

Nikhil Jethava and Shankar Sivadasan make an announcement:

We are excited to announce the preview of Serverless compute for Databricks SQL (DBSQL) on Azure Databricks. DBSQL Serverless makes it easy to get started with data warehousing on the lakehouse. Serverless compute for DBSQL helps address challenges customers face with cluster startup time, capacity management, and infrastructure costs:

Click through for more details and a short video. Azure Synapse Analytics and Databricks are definitely going head-to-head in the modern data warehousing space and I’m fine with that—hopefully it makes both products better as a result.

Comments closed

Updates to AzureDevOps-AzureSQLDatabase Repo

Kevin Chant updates a repo:

In this post I want to cover some significant updates to an Azure SQL Database repository that I have been doing for one of the public GitHub repositories that I share.

Due to the fact that I have updated the AzureDevOps-AzureSQLDatabase repository. Which contains an example of a SQL Server database project that you can use to perform CI/CD on an Azure SQL Database using Azure DevOps.

It does this by using the popular state-based migration method of creating a dacpac file based on the contents of a database project. From there, the dacpac file can be used to update one or more databases.

Click through for those updates.

Comments closed

Preventing Data Exfiltration form Managed Instances

Niko Neugebauer wants to hang on to that data:

Data exfiltration is a technique that is also sometimes described as data theft or data extrusion, that describes the unauthorized extraction of data from the original source. This unauthorized extraction can be executed either manually or automatically by the malicious attacker.

As part of your Network Infrastructure, you might have tightened your security to make sure you have all the bells and whistles to lock down your Azure SQL Managed Instance to be accessed only by your application and not exposed to the Internet or any other traffic. However, this doesn’t stop a malicious admin from taking a backup or creating a linked server to another resource outside your enterprise subscription for extracting the data. This action would be data exfiltration. In a typical on-premises infrastructure, you can lock down network access completely to make sure that the data never leaves your network. However, in a cloud setup, there is a possibility that someone with elevated privileges can export data or perform some other malicious activity targeting their own resources outside your organization, compromising your enterprise data. Hence, it is very important to understand the different data exfiltration scenarios and make sure that you are taking the right steps to monitor for and prevent such activities.

Click through for a table which shows common exfiltration scenarios and things you can do to reduce the risk of exfiltration. With access, though, there’s always going to be a risk of exfiltration: even in a SCIF, you can get away with shoving records into your pants if you’re famous enough.

Comments closed

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….

Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

From Azure Data Explorer to Excel

Dany Hoter views data in Excel:

In a previous article Direct Query from Excel to Azure Data Explorer (microsoft.com) I described a way to mimic Direct Query access ala Power BI in Excel.

The method used in this article that allows the user to filter the imported data using values entered into cells in the grid.

In this article I would like to describe a way to really query Kusto data in real time without importing any data and without any volume limitations.

Read on to see how, though there’s a pretty big intermediate step.

Comments closed

Organizing Data Domains in a Data Mesh

Paul Andrew continues a series on data mesh architecture:

Defining an organisation hierarchy is always hard, even more so for large enterprises with massive amounts of interlock between business functions. In the context of data analytics, we attempt to tackle the problem by creating an organisation dimension as part of our star schema data model. This could include things like region, operating company, branch, department, team etc.

So, my friends, how do we go about handling this when considering a data mesh architecture and the de-centralised domains that support the natural scalability we crave. For me, it feels like we are just frontloading the dimensional modelling problem. Tackling it from the beginning in the very foundations of our data platform. But, with a twist.

Read on for that twist and for some solid guidance on data domains in practice compared to the theory.

Comments closed

Azure Synapse Analytics July 2022 Updates

Ryan Majidimehr notes that the Azure Synapse Analytics team has been busy:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases into a dedicated SQL pool in Azure Synapse Analytics. Starting this month, you can make trade-offs between cost and latency in Synapse Link for SQL by selecting the continuous or batch mode to replicate your data.  

By selecting “continuous mode”, the runtime will be running continuously so that any changes applied to the SQL database or SQL Server will be replicated to Synapse with low latency. Alternatively, when you select “batch mode” with a specified interval, the changes applied to the SQL database or SQL Server will be accumulated and replicated to Synapse in batch mode with the specified interval. This can save cost as you are only charged for the time the runtime is required to replicate data. After each batch of data is replicated, the runtime will be shut down automatically. 

Click through for the complete list.

Comments closed