Press "Enter" to skip to content

Author: Kevin Feasel

Using the Cosmos DB Analytics Storage Engine

Hasan Savran explains the purpose of the Cosmos DB Analytics Storage Engine:

Analytics storage uses Column Store format to save your data. This means data is written to disk column by column rather than row by row. This makes all aggregation function run fast because disk does not need to work hard to find data row by row anymore. Cosmos DB takes responsibility to move data from Transaction Store to Analytical Store too. You do not need to write any ETL packages to accomplish this. That means you do not need to figure out which data needs to update, which data should be deleted. Azure Cosmos DB figures all data for you, syncs the data between these two storage engines. This gives us the isolation we have been looking for between transactional and analytical environments. Data written to transactional storage will be available in Analytical Storage less than 5 minutes. In my experience, it really depends on the size of the database, if you have a smaller database usually data becomes available in Analytical Storage in less than a minute.

This makes the data easy to ingest into Azure Synapse Analytics, for example.

Comments closed

Linking between Notebooks in Azure Data Studio

Julie Koesmarno shows us the rules of linking notebooks in Azure Data Studio:

When writing a notebook, it can be very handy to be able to refer to a specific part to a notebook and allow the readers to jump to that part, i.e linking or anchoring. Using this technique, you can also create an index list or a table of contents or cross-referencing to parts of other notebooks too. Check out my demo notebook for this linking topic, from MsSQLGirl Github Repo.

Read on for those rules.

Comments closed

Integrating Power BI with Azure Synapse Analytics

Santosh Balasubramanian walks us through the process of querying Azure Synapse Analytics data with Power BI:

In this guide, you will be integrating an already-existing Power BI workspace with Azure Synapse Analytics so that you can quickly access datasets, edit reports directly in the Synapse Studio, and automatically see updates to the report in the Power BI workspace. We will be using a Power BI report developed using the Movie Analytics dataset of the previous guide to show the functionalities of the Power BI integration in Azure Synapse.

Click through for the demo.

Comments closed

Internal and External Azure Data Factory Pipeline Activities

Paul Andrew differentiates two form of pipeline activity:

Firstly, you might be thinking, why do I need to know this? Well, in my opinion, there are three main reasons for having an understanding of internal vs external activities:

1. Microsoft cryptically charges you a different rate of execution hours depending on the activity category when the pipeline is triggered. See the Azure Price Calculator.

2. Different resource limitations are enforced per subscription and region (not per Data Factory instance) depending on the activity category. See Azure Data Factory Resource Limitations.

3. I would suggest that understanding what compute is used for a given pipeline is good practice when building out complex control flows. For example, this relates to things like Hosted IR job concurrency, what resources can connect to what infrastructure and when activities may might become queued.

Paul warns that this is a dry topic, but these are important reasons to know the difference.

Comments closed

Multiple Slicers and AND Logic

Stephanie Bruno embraces the healing power of AND:

When using slicers in Power BI reports, multiple selections filter data with OR logic. For example, if you have a slicer with products and your visuals are displaying total number of invoices, then when “bicycles” and “helmets” are selected in the products slicer your visual will show the number of invoices that include bicycles OR helmets. But what if you need to have it instead only show the number of invoices that include bicycles AND helmets? Read on to find out how you can do just that with DAX.

Read on for the solution.

Comments closed

Working with Serverless and Dedicated SQL Pools in Azure Synapse Analytics

Igor Stanko takes us through both dedicated and serverless SQL Pools in Azure Synapse Analytics:

Both serverless and dedicated SQL pools can be used within the same Synapse workspace, providing the flexibility to choose one or both options to cost-effectively manage your SQL analytics workloads. With Azure Synapse, you can use T-SQL to directly query data within a data lake for rapid data exploration and take advantage of the full capabilities of a data warehouse for more predictable and mission-critical workloads. With both query options available, you can choose the most cost-effective option for each of your use cases, resulting in cost savings across your business.

This post explores 2 consumption choices when exercising analytics using Synapse SQL (serverless and dedicated SQL pools) and examines the power and flexibility provided by Azure Synapse when both are used to execute T-SQL workloads. In addition, we will explore options to control cost when using both models.

Click through for details, including hints on minimizing costs.

Comments closed

Using Scala in a Databricks Notebook

Tomaz Kastrun take a look at the original Spark language:

Let us start with Databricks datasets, that are available within every workspace and are here mainly for test purposes. This is nothing new; both Python and R come with sample datasets. For example the Iris dataset that is available with Base R engine and Seaborn Python package. Same goes with Databricks and sample dataset can be found in /databricks-datasets folder.

Click through for the walkthrough and introduction to Scala as it relates to Apache Spark.

Comments closed