Press "Enter" to skip to content

Day: December 15, 2022

Synapse Runtime for Spark 3.3 Now in Public Preview

Estera Kot has an announcement:

We are excited to announce the preview availability of Apache Spark™ 3.3 on Synapse Analytics. The essential changes include features which come from upgrading Apache Spark to version 3.3.1 and upgrading Delta Lake to version 2.1.0.

Check out the official release notes for Apache Spark 3.3.0 and Apache Spark 3.3.1 for the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.2 and 3.3 to assess potential changes to your applications, jobs and notebooks.

There’s a lot in there, though I did snicker a bit at log4j 2 being more secure than log4j v1 given what we saw last year, though that gaping hole was fixed.

Comments closed

Bringing Order to a Columnstore Index

Tibor Karaszi puts columnstore ducks in a row:

Data for a columnstore index is divded in groups of approximate 1 million rows, rowgroups. Each rowgroup has a set of pages for each column. The set of pages for a column in a rowgroup is called a segment. SQL Server has meta-data for the lowest and highest value for a segment. There are no SEEKs in a columnstore index. But, SQL Server can use this meta-data to skip reading segments, with the knowledge that “this segment cannot contain any data that I need based on my predicates in my WHERE clause”.

Also, you might want to do these operations using MAXDOP 1, so we don’t have several threads muddling our neat segment alignment.

I’m not sure I actually set the ORDER BY clause on columnstore indexes all that often—a quick mental survey says maybe once, though that could be my own failing rather than a statement on the utility of ordered columnstore indexes.

Comments closed

Window Functions in DAX

Jeffrey Wang is speaking my language:

The December 2022 release of Power BI Desktop includes three new DAX functions: OFFSETINDEX, and WINDOW. They are collectively called window functions because they are closely related to SQL window functions, a powerful feature of the SQL language that allows users to perform calculations on a set of rows that are related to the current row. Because these functions are often used for data analysis, they are sometimes called analytical functions. In contrast, DAX, a language invented specifically for data analysis, had been missing similar functionalities. As a result, users found it hard to write cross-row calculations, such as calculating the difference of the values of a column between two rows or the moving average of the values of a column over a set of rows.

Read on to learn more about how these functions work and how they differ from their SQL Server counterparts.

Comments closed

Encryption Recommendations for SQL Server

Matthew McGiffen shares some thoughts:

We looked earlier at the various places our data lives – on disk, in memory, in-transit across the network. A good strategy will protect all of these locations – sometimes with multiple layers. Exactly how you go about doing that for your applications may vary – but as long as you are on SQL Server 2016 or higher there is a default strategy that you should consider. This combines a number of the available SQL Server features to provide the best protection.

This story might look a bit different for Azure SQL Database (though you can still use Always Encrypted there) and will look very different on Azure Synapse Analytics dedicated SQL pools, which don’t have Always Encrypted at all.

Comments closed

Paginated Reports in Power BI Pro

Andy Jones has some good news for us:

Microsoft recently announced that Power BI Paginated Reports is now available to users with a Pro licence. Previously, you required a Premium licence to take advantage of Paginated Reports in Power BI.

Read on to learn how paginated reports work in Power BI (hint: almost like SSRS). Even with Power BI Premium Per User being a fairly reasonable $20 per user per month, it’s nice to see Pro get a bit more value.

Comments closed