Curated SQL – Page 757 – A Fine Slice Of SQL Server

Hierarchical Business Terms in Azure Purview

Published 2021-03-04 by Kevin Feasel

Wolfgang Strasser points out a new enhancement in Azure Purview:

During Spring Ignite 2021, many new features for Azure products have been announced. And Azure Purview was included in the list of enhancements.
Today, I want to highlight one of these enhancements – the ability to create hierarchical business glossary terms in the Azure Purview data catalog.

Click through to see how it works.

Comments closed

Including and Resizing External Images in knitr

Published 2021-03-03 by Kevin Feasel

The folks at Jumping Rivers continue a series on knitr and rmarkdown:

In this third post, we’ll look at including eternal images, such as figures and logos in HTML documents. This is relevant for all R markdown files, including fancy things like {bookdown}, {distill} and {pkgdown}. The main difference with the images discussed in this post, is that the image isn’t generated by R. Instead, we’re thinking of something like a photograph. When including an image in your web-page, the two key points are
– What size is your image?
– What’s the size of your HTML/CSS container on your web-page?

Read the whole thing.

Comments closed

Semantic Search in Azure Cognitive Search

Published 2021-03-03 by Kevin Feasel

Rangan Majumder, et al, have an article on how semantic search works in Azure Cognitive Search:

As part of our AI at Scale effort, we lean heavily on recent developments in large Transformer-based language models to improve the relevance quality of Microsoft Bing. These improvements allow a search engine to go beyond keyword matching to searching using the semantic meaning behind words and content. We call this transformational ability semantic search—a major showcase of what AI at Scale can deliver for customers.
Semantic search has significantly advanced the quality of Bing search results, and it has been a companywide effort: top applied scientists and engineers from Bing leverage the latest technology from Microsoft Research and Microsoft Azure. Maximizing the power of AI at Scale requires a lot of sophistication. One needs to pretrain large Transformer-based models, perform multi-task fine-tuning across various tasks, and distill big models to a servable form with very minimal loss of quality. We recognize that it takes a large group of specialized talent to integrate and deploy AI at Scale products for customers, and many companies can’t afford these types of teams. To empower every person and every organization on the planet, we need to significantly lower the bar for everyone to use AI at Scale technology.

Click through to learn more about the technology.

Comments closed

The Logging Costs of DROP TABLE and TRUNCATE

Published 2021-03-03 by Kevin Feasel

Paul Randal explains that DROP TABLE and TRUNCATE TABLE are logged operations:

Hopefully you all know that it’s a myth that DROP TABLE and TRUNCATE TABLE are non-logged operations. If you didn’t know that, read my blog post on sqlperformance.com that explains about the deferred drop mechanism. Both operations are fully logged, and will generate quite a bit of transaction log.
The bulk of the log that’s generated comes from having to log the deallocation of extents and the pages within them. For each extent, a bit must be cleared in the corresponding GAM page and IAM page, and all 8 pages in the extent must be marked as deallocated in the corresponding PFS page (turning off the 0x40 bit in each PFS byte). So that’s three log records per allocated extent.

To get a feeling for how much that is, Paul provides an example of a 20TB table being dropped.

Comments closed

Using Terraform to Tag Created Date

Published 2021-03-03 by Kevin Feasel

John Martin has an interesting use case for tagging in Terraform:

One of the key properties missing from Azure resources, in my opinion anyway, is a CreatedDate. This can be largely overcomes with Azure policy, but what if you don’t have access to create one that applies a timestamp tag at resource creation?
It is possible to use Terraform to tag the resource and set the value for when the resource is created. There is a little more work that needs to go into it to ensure that once it is set that Terraform does not overwrite it on subsequent deployments. But, it is achievable and brings this into your control if needed.

Click through to see how.

Comments closed

Azure Synapse Pathway

Published 2021-03-03 by Kevin Feasel

John Macintyre announces a new product:

Azure Synapse Pathway connects to the source system and inspects details about your database objects. An assessment report captures further details on the database objects that can be translated into Azure Synapse Analytics. With Azure Synapse Pathway the source database objects are automatically converted and optimized to T-SQL code on Azure Synapse Analytics. This means your existing code, whether a thousand or million lines of code, will be converted by Azure Synapse Pathway.
As a result of these capabilities, the traditional process of manual code conversion can now be automated in a fraction of the time; all while cutting out manual errors and reducing the total cost of the migration.

They’re starting with a few data sources (including Snowflake), but it’s an interesting product. I could see it useful for getting 80-85% of the migration done, though I don’t trust auto-generated code to be optimal.

Comments closed

Batch Mode on Row Store in SQL Server 2019

Published 2021-03-03 by Kevin Feasel

Deepthi Goguri looks at a nice performance improvement in SQL Server 2019:

In the previous post, we learned about Table variable deferred compilation. In this blog, lets focus on the batch mode on rowstore feature introduced in SQL Server 2019. This feature improves the performance of the analytical queries using the batch mode query processing. This feature is for CPU optimization helping analytical queries to run faster. We do not have to specify this option if the database compatibility is 150.
This feature is especially for the analytical queries for CPU bound analytic workloads without needing the columnstore indexes. We can specifically mention the hints in the query for using the batch mode or not.

There are specific rules which must be met before it kicks in, but the performance benefit can be significant. If you’re running SQL Server 2017, you needed a columnstore index on a table to get batch mode, though there is a trick around this: you can create a filtered, nonclustered columnstore index WHERE 1=0 so that it doesn’t have any rows. Then, any queries which hit that table are potentially eligible for batch mode processing, even though none of them use the columnstore index.

Comments closed

Creating a Database Publish Profile in Visual Studio

Published 2021-03-03 by Kevin Feasel

Elizabeth Noble shows us how to create a database publish profile using Visual Studio:

One of our fears was always how to prevent losing data and critical data code. Here were publish profiles to our rescue. We also found that some of our database code had specific values depending on the environment or contained references to other databases. Once again, publish could solve these problems!
While I’d love to say that you could use ADS to manage your database projects, that just isn’t true right now. However, we have a way to help you get a publish profile created. If you don’t want to use Visual Studio yourself, you might want to ask your Developer friends real nice and see if they’d be willing to help you out.

Click through for a video and a sample of what a publish profile looks like.

Comments closed

Applied ML Prototypes

Published 2021-03-02 by Kevin Feasel

Alex Bleakley and Santiago Giraldo announce Applied ML Prototypes:

To directly address these challenges, we’ve released Applied ML Prototypes (AMPs) — a revolutionary new way of developing and shipping enterprise ML use cases — which provide complete ML projects that can be deployed with one click directly from Cloudera Machine Learning. AMPs enable data scientists to go from an idea to a fully working ML use case in a fraction of the time, with an end-to-end framework for building, deploying, and monitoring business-ready ML applications instantly.
AMPs move the starting line for any ML project by enabling data scientists to start with a full end-to-end project developed for a similar use case, including a trained and deployed ML model, as well as prebuilt predictive business applications, out of the box. This means that ML development teams can tackle their own ML business use cases more quickly, from those involving churn modeling, to sentiment analysis, to anomaly detection and beyond.

Getting past the marketing fluff, there are some interesting ideas here.

Comments closed

Installing Spark on Windows Subsystem for Linux

Published 2021-03-02 by Kevin Feasel

David Alcock wants Spark, but not Windows Spark:

The post won’t cover any instructions for installing Ubuntu and instead I’ll assume you’ve installed already and downloaded the tgz file from the Apache Spark download page (Step 3 in the above link).
Let’s go straight into the terminal window and get going! I’ve put the commands in bold text (don’t include the $) just so anyone can see a bit easier and who also prefers to ignore my jibberish!

Click through for the instructions.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Curated SQL Posts