Press "Enter" to skip to content

Month: October 2023

Enabling Python and R Support for VS Code Polyglot Notebooks

Joy George Kunjikkur enables a preview option:

Obviously, we should have Polyglot notebooks up and running. The first step to enable Python preview is that we need to install Jupyter on the machine and make sure the Python kernel spec is available. Run the below command to make sure it is there.

It looks like what the preview is doing is shelling out to Jupyter notebooks, so I’d imagine variables won’t cross over between languages.

Comments closed

Row- and Column-Level Security in Microsoft Fabric Warehouses

Maheswaran Arunachalam gives us an update on data security in Microsoft Fabric:

Row-Level Security (RLS) and Column-Level Security (CLS) are methods that simplify the design and coding of security in applications by imposing restrictions on data access. CLS allows for specific users to access only certain columns of a table relevant to their department, protecting sensitive data. RLS, on the other hand, restricts users to accessing only data rows pertinent to their role or department. Both methods locate the access restriction logic in the database tier, applying the restrictions every time data access is attempted from any tier, making the security system more reliable and robust.

Click through for the process. This is pretty much the same as any SQL Server-based system.

Comments closed

Building a Bland-Altman Plot in R

Steven Sanderson performs a comparison:

Before we dive into the code, let’s briefly understand what a Bland-Altman plot is. It’s a graphical method to visualize the agreement between two measurement techniques, often used in fields like medicine or any domain with comparative measurements. The plot displays the differences between two measurements (Y-axis) against their means (X-axis).

Click through to see how this works and how you can interpret the results.

Comments closed

Trying out Batch Mode on Rowstore

Etienne Lopes has some fun with a feature:

Before 2012, creating analytical queries (that usually scan many rows and have lots of aggregations) from big OLTP databases to feed real-time based reports used in decision making processes, could be quite challenging. Then ColumnStore Indexes arrived and they’ve been enhanced overtime, offering amazing gains both in performance and storage.

Unfortunately, regarding pure OLTP databases, there are many situations in which ColumnStore Indexes can’t (or won’t) be used. There are some great performance enhancements present in columnstore that’s for sure and today I’m going to speak about one that became automatically available since SQL Server 2019 for “traditional” RowStore tables. It’s called “Batch Mode on Rowstore” and it can really boost some of our analytical queries over the “traditional tables” without any effort from our side!

There are a series of specific rules you need to hit but if you hit them, I’ve noticed about a 3x performance gain with you doing nothing at all.

Comments closed

Analyzing Dependencies in Microsoft Fabric Measures

Sandeep Pawar has an interesting library for us:

In my previous blog post, I introduced Semantic-Link, discussing its use cases and explained how it enables us to create solutions that were either not possible or not easily achievable before. In this blog post, I would like to present another powerful use case that, although possible in the past, could not be created and used seamlessly in Power BI. Allow me to introduce the MeasureMaze Python library, which helps uncover insights from a complex maze of dependencies in a Power BI semantic model using Semantic-Link and the power of network analysis.

Sandeep dives into the approach and what you get out of this library. It’s very interesting.

Comments closed

Automating SQL Dump Analysis

Sean Gallardy has been busy:

One of the things that is lacking in the community is for some simple dump analysis for SQL Server, well no longer!

Introducing the Beta version of SQL Dump Analysis… sure it’s not super pretty, sure it’s not 100% yet… but it’ll give you the basic goods and get you on your way.

This is a website you upload SQL dumps to, rather than an executable you’d download. Still, check it out.

Comments closed

Skipping through Rows in a Large File with Powershell

Jay Robinson has a one-liner for us:

It came from working with enormous text files, and the gotchas that come with them. In this case, I had a 50gb data file that had something wrong with it, buried about 25% in. The file was being processed just fine, until it hit this unexpected data. And because the ETL app was written for performance first, there wasn’t a lot of data validation being done. So it’d just go boom when it hit that batch.

So what was wrong with the file? Well, in order to determine that, I had to see what was in that batch. But you can’t just open a 50gb file in Notepad. Yes, there are other ways to get around this, but here’s the one I chose:

Click through for Jay’s solution to the problem.

Comments closed

SSIS Catalog Error 27150

Andy Leonard diagnoses an error:

In case you’ve never encountered SSIS Catalog Error 27150, the error message reads:

The version of the project has changed since the instance of the execution has been created. Create a new execution instance and try again.

A customer recently encountered this error.
So… what happened?

Click through for the answer. I’ve had this happen as well for the same reason as Andy.

Comments closed

Packed/Batched Remote Procedure Calls in SQL Server

Bob Dorr digs in:

The SQL Server TDS protocol provides two main paths for query execution (Language and RPC events.)  You can trace these events using the Batch::Starting/Completed (Language) and RPC:Starting/Completed (RPC) XEvents.

Language events are text streams requiring full parsing and may be susceptible to injection attacks.  Language events also require educated guesses.  For example, should select 1 return a smallint, bigint?

Bob goes on to show an example of a simple call being susceptible to SQL injection, explains why Remote Procedure Calls (RPCs) are superior, and what packing (or batching) RPCs does for you.

Comments closed

Functional Programming and R

Anirban Shaw ties functional programming to R:

Functional Programming‘s relevance in the R programming language, a language primarily known for its prowess in data analysis and statistical computing, is particularly noteworthy. By leveraging functional programming, organizations can improve operational efficiency and gain a competitive edge

R’s ecosystem is enriched by functional programming paradigms, which enable developers and data scientists to write concise and expressive code for tasks such as data manipulation, transformation, and visualization.

In this article, we take a deep dive into the fundamental characteristics of R, the advantages of adopting functional programming within it and the essential concepts ingrained in the core of R. 

Read on to see how the two fit together. H/T R-Bloggers.

Comments closed