Press "Enter" to skip to content

Month: January 2020

Diagnosing and Resolving CPU Issues

Jeff Iannucci has a story to tell:

Anyhow, [Erik] Darling’s tool is designed to provide five data sets showing a quick view of memory and processor usage based on several dynamic management views (DMVs). It’s gives you a quick look into what’s going on when the CPU and/or memory of your SQL Server instance are feeling discomfort.

Why do I mention this? Because little did I know the very next day I would be presented with a fine opportunity to use this tool. And by opportunity I mean a server on fire. (Well, not literally, but still…)

Click through for a description of sp_pressure_detector and a testimonial of its utility.

Comments closed

ACID Transactions with Cosmos DB

Hasan Savran shows how you can use the Cosmos DB SDK to create ACID transactions:

What about Azure Cosmos DB? It’s a NoSQL database, probably you can’t do ACID Transactions right? WRONG! Azure Cosmos DB has been supporting ACID transaction for some time now. We were able to create ACID transactions by using stored procedures of Cosmos DB. Last year (2019) Cosmos DB team introduced ACID transactions to Cosmos DB SDK. Now, we can create transactions by using C# just like writing transactions by using SQLClient class for SQL Server!

      To create an ACID transaction in Cosmos DB SDK, we need to use TransactionalBatch object. You need add all operations in transaction to TransactionalBatch object. All the operations attached to the TransactionalBatch object must share the same partition key. In the following example, I created three objects and attach them to TransactionalBatch object. To start the transaction, I ran the ExecuteAsync() function.  This function runs the transaction and returns the responses for each operation.

I’d think you would need to set a strong consistency level as well.

Comments closed

Data Lake Storage and Data Processing

Ashish Kumar has started a series on data lake essentials:

Data Lake architecture is all about storing large amounts of data which can be structured, semi-structured or unstructured, e.g. web server logs, RDBMS data, NoSql data, social media, sensors, IoT data and third-party data. A data lake can store the data in the same format as its source systems or transform it before storing.

The main purpose of a data lake is to make organizational data from different sources, accessible to a variety of end users like business analysts, data engineers, data scientists, product managers, executives, etc, in order to enable these personas to leverage insights in a cost-effective manner, for improved business performance. Today, many forms of advanced analytics are only possible on data lakes.

Click through for more information on what a data lake should provide—whether that be in-house or a cloud provider.

Comments closed

Fixing Screen Repainting Issues in SSMS

Greg Low has a workaround for an annoying problem:

Once again, I’m seeing lots of customers reporting screen repainting issues in SQL Server Management Studio (SSMS). It mostly seems to affect version 18 but I’ve also seen it in version 17. And it’s most prevalent on Windows 10.

The typical issue is that you click on another open tab, and the contents of the tab doesn’t repaint. You are still seeing the previous tab. If you click into the tab, you start to see bits from both tabs.

Click through to see the fix. I’ve seen this issue pop up though I don’t remember seeing it on the latest version of SSMS 18…though now that I say that, I’m guaranteed to have the problem hit me today.

Comments closed

Managed Identity with Azure Functions

Taiob Ali shows how you can safely store credentials which your Azure Function apps need:

With the announcement of Powershell support in Azure Functions, it has become easier for data professionals to use functions to manage cloud resources such as Azure SQL Database, Managed Instances. A common challenge when using functions is how to manage the credentials in function code for authenticating databases. Keeping the credentials secure is an important task. Ideally, the credentials should never appear in the code or in the source control.

Manged Identity can solve this problem as Azure SQL Database and Managed Instance both support Azure AD authentication. You can read mode about Managed Identity here.

In this article, I will show how to set up Azure Function App to use Managed Identity to authenticate functions against Azure SQL Database.

The example connects Azure SQL DB, but this is a general-purpose solution.

Comments closed

Financial Statements in Power BI

Joseph Yeates has started a series on creating financial statements in Power BI:

This post is part 1 in my series on creating financial statements in Power BI! I’m starting with creating an Income Statement. The source data and Power BI file used in the example below can be found here.

I loaded the source data into the Power BI report. It consisted of three tables:

– Fact table: contains dollar amount of transactions
– GL table: contains categorization of transactions
– Calendar table: contains date information for the data model

This will be interesting to watch, especially because this kind of task is generally handled in a tool like Reporting Services instead of Power BI.

Comments closed

Deploying a Big Data Cluster to a Multi-Node kubeadm Cluster

Mohammad Darab shows how we can deploy a SQL Server Big Data Cluster on a multi-node kubeadm cluster:

There are a few assumptions before we get started:

1. You have at least 3 virtual machines running with the minimum hardware requirements
2. All your virtual machines are running Ubuntu Server 16.04 and have OpenSSH installed
3. All the virtual machines have static IPs and on the same subnet
4. All the virtual machines are updated and have been rebooted

Mohammad shows how to set up the cluster, configure Kubernetes, and then install Big Data Clusters. Definitely worth the read if you’re interested in building a Big Data Cluster on-premises.

Comments closed

Updates in Confluent Platform 5.4

Tim Berglund takes us through what has changed in Confluent Platform 5.4:

Role-Based Access Control (RBAC)

Back in July, we announced the preview for RBAC as part of the Confluent Platform 5.3 release. After gathering feedback and learning from everyone who tried it out, we are now pleased to announce the availability of RBAC in Confluent Platform 5.4. You can now make use of this feature in production environments with Confluent’s full support.

RBAC offers a centralized security implementation for enabling access to resources across the entire Confluent Platform with just the right level of granularity. You can control permissions you grant to users and groups to specific platform resources, starting at the cluster level and moving all the way down to individual topics, consumers groups, or even individual connectors. You do this by assigning users or groups to roles. This gets you out of the game of managing the individual permissions of a huge number of principals—a real problem for large enterprise deployments.

RBAC delivers comprehensive authorization enforced via all user interfaces (Confluent Control Center UI, CLI, and APIs), and across all Confluent Platform components (Control Center, Schema Registry, REST Proxy, MQTT Proxy, Kafka Connect, and KSQL). Given the distributed architecture not only of Apache Kafka but also of other platform components like Connect and KSQL, having a single framework to centrally manage and enforce security authorizations across all the components is, in a word, essential for managing security at scale.

Click through for several more features and where you can try it out, either on-premises or in a major cloud host.

Comments closed

Spark is Not ACID Compliant

Kundan Kumarr explains how it is that Apache Spark is not ACID compliant:

Atomicity states that it should either write full data or nothing to the data source when using spark data frame writer. Consistency, on the other hand, ensures that the data is always in a valid state.

As evident from the spark documentation below, it is clear that while saving data frame to a data source, existing data will be deleted before writing the new data. But in case of job failure, the original data will be lost or corrupted and no new data will be written.

Click through for an explanation of these two along with a demo, and then an explanation of how Spark Datasets don’t follow the Isolation or Durability properties either. I don’t think any of this is earth-shattering to people, but it is a good reminder that Spark doesn’t fit all use cases.

Comments closed