2022-04-01 – Curated SQL

Automatic text summarization comes in two flavours: extractive summarization and abstractive summarization. Extractive summarization models take exact phrases from the reference documents and use them as a summary. One of the very first research papers on (extractive) text summarization is the work of Luhn [1]. TextRank [2] (based on the concepts used by the PageRank algorithm) is another widely used extractive summarization model.
In the era of deep learning, abstractive summarization became a reality. With abstractive summarization, a model generates a text instead of using literal phrases of the reference documents. One of the more recent works on abstractive summarization is PEGASUS [3] (a demo is available at HuggingFace).

Click through for a couple contemporary examples as well as a few pain points you can experience when using the current set of libraries and algorithms.

Comments closed

An Overview of the Microsoft Defender Ecosystem

Published 2022-04-01 by Kevin Feasel

Alan La Pietra looks at all the Defenders you can get your hands on:

Microsoft Defender Antivirus is available in Windows 10 and Windows 11, and in versions of Windows Server
Microsoft Defender Antivirus is a major component of your next-generation protection in Microsoft “Defender for Endpoint”
Microsoft Defender Antivirus is built into Windows, and it works with Microsoft Defender for Endpoint to provide protection on your device and in the cloud

I see the hand of marketing in this. Which means they’ll probably all have different names nine months from now.

Comments closed

Building S3 Data Pipelines — The Tools

Published 2022-04-01 by Kevin Feasel

Chris Adkin continues a series:

In my last post I outlined a number of architectural options for solutions that could be implemented in light of Microsoft retiring SQL Server 2019 Big Data Clusters, one of which was data pipelines that leverage Python and Boto 3. Before diving into these things in greater detail, lets take a recap on what S3 is.

Click through for a simple data pipeline example.

Comments closed

Lists in Power Query

Published 2022-04-01 by Kevin Feasel

Ed Hansberry makes a list and checks it twice:

Lists in Power Query are something many people know nothing about. Power Query uses them all the time even though you may not realize it, so if you add some List knowledge to your quiver, you’ll be able to kick your Power Query skills up a notch.
In my work, I often see the need for counting words, especially today with so much online data. Perhaps you want to ensure your Amazon product listings have a maximum number of words in the descriptions or you want to count the words in a podcast. The method I’m going to show you will count anything in your data, so you can apply this pattern to any of your datasets.

Also known as map and reduce (but not quite MapReduce).

Comments closed

Performance Gains with LAG and LEAD

Published 2022-04-01 by Kevin Feasel

Ronen Ariely provides a solution:

However, the answer in this specific case was not optimal. Unfortunately in most cases in the forums, most people that come to ask a question, do not care about learning but only about the solution, even so in my opinion the road is just as important as the end point. The road (the learning) is what will help the person to solve the next issue and not just the current one – teach a man to fish and you feed him for a lifetime…
The op marked the answer he got and I assume that from his point of view the discussion ended, but I wanted to present the solution which might be tens time better in some cases, which is what I will do in this post…. so let’s start

I won’t dive too deeply into Ronen’s philosophical argument—you can definitely read about that in the post. I will say I am sympathetic to the argument at the margin and believe it’s worthwhile to know the superior solution.

Comments closed

Error Calling SQLSetDescRec via PolyBase

Published 2022-04-01 by Kevin Feasel

Nathan Schoenack troubleshoots an error:

When trying to query an external table created for a generic ODBC external data source, the following error can be observed:
Message 7320, level 16, state 110, line 87
Unable to execute query “Remote Query” against OLE DB provider “MSOLEDBSQL” on link server “(null)”. 105082; Generic ODBC error: OdbcBufferReader.ReadBuffer, error in OdbcReadBuffer: SqlState: IM001, NativeError: 0, ‘Error calling: SQLSetDescRec(this->GetHdesc(), (SQLSMALLINT)column->idxServerCol, (SQLSMALLINT)column->odbcReadType, 0, column->valueLength, (SQLSMALLINT)column->precision, (SQLSMALLINT)column->scale, (SQLPOINTER)(pBuffer + column->valueOffset), (SQLLEN *)indPtr, (SQLLEN *)indPtr), SQL return code: -1 | SQL Error Info: Error <1>: ErrorMsg: [Microsoft][ODBC Driver Manager] The driver does not support this function. | Error calling: pReadConn->ReadBuffer(pBuffer, bufferOffset, bufferLength, pBytesRead, pRowsRead) | state: FFFF, number: 239, active connections: 9’, Connection String: Dsn={DSN Name};Driver={Driver Name};uid=root;server=xxxxx;port=xxxx;database=xxxx.

Read on for a viable workaround.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Day: April 1, 2022

An Overview of Automatic Text Summarization

An Overview of the Microsoft Defender Ecosystem

Building S3 Data Pipelines — The Tools

Lists in Power Query

Performance Gains with LAG and LEAD

Error Calling SQLSetDescRec via PolyBase