Curated SQL – Page 764 – A Fine Slice Of SQL Server

Defect Detection with AWS Lookout and Sagemaker

Published 2021-10-07 by Kevin Feasel

Matthew Rhodes, et al, take us through an interesting case study:

According to a recent study, defective products cost industries over $2 billion from 2012–2017. Defect detection within manufacturing is an important business use case, especially in high-value product industries like the automotive industry. This allows for early diagnosis of anomalies to improve production line efficacy and product quality, and saves capital costs. Although advanced anomaly detection systems employ sensors as well as Internet of Things (IoT) devices to collect multimodal data to improve performance, computer vision continues to be a common approach. Detecting anomalies in automotive parts and components using computer vision can be done using normal images, and even X-Ray based images for structural damages. Recent advances in deep learning and computer vision have allowed scientists and manufacturers to develop enhanced anomaly detection systems, including surface defect detection on automotive body panels and dent detection in vehicles.

Read on for case notes.

Comments closed

Measuring File Latency in SQL Server

Published 2021-10-07 by Kevin Feasel

Anthony Nocentino has a script and some tips for us:

This post is a reference post for retrieving IO statistics for data and log files in SQL Server. We’ll look at where we can find IO statistics in SQL Server, query it to produce meaningful metrics, and discuss some key points when interpreting this data.

Click through for the script, and then a bulleted list of things to keep in mind as you’re reviewing the data.

Comments closed

Reasons Why Slow Queries Might Not Use Much CPU

Published 2021-10-07 by Kevin Feasel

Erik Darling has a compendium for us:

No one ever says a broken record is right twice a day, perhaps because DJs are far more replaceable than clock makers.
I say that only to acknowledge that I may sounds like a broken record when I say that when you’re tuning a query, it’s quite important to compare wall clock time and duration. Things you should note:

Click through for those things.

Comments closed

New in SQL Server Big Data Clusters

Published 2021-10-07 by Kevin Feasel

Daniel Coelho has an update on what’s available in SQL Server Big Data Clusters:

SQL Server Big Data Clusters (BDC) is a capability brought to market as part of the SQL Server 2019 release. Big Data Clusters extends SQL Server’s analytical capabilities beyond in-database processing of transactional and analytical workloads by uniting the SQL engine with Apache Spark and Apache Hadoop to create a single, secure, and unified data platform. It is available exclusively to run on Linux containers, orchestrated by Kubernetes, and can be deployed in multiple-cloud providers or on-premises.
Today, we’re proud to announce the release of the latest cumulative update, CU13, for SQL Server Big Data Clusters which includes important changes and capabilities:

Updating to the most recent production-ready version of Spark (as of today) is a nice upgrade.

Comments closed

Wackiness with TrimEnd in Powershell

Published 2021-10-07 by Kevin Feasel

Shane O’Neill digs into TrimEnd:

A couple of days ago, I was running some unit tests across a piece of PowerShell code for work and a test was failing where I didn’t expect it to.
After realising that the issue was with the workings of TrimEnd and my thoughts on how TrimEnd works (versus how it actually works), I wondered if it was just me being a bit stupid.
So I put a poll up on Twitter, and I’m not alone! 60% of the people answering the poll had the wrong idea as well.

The way that works is…not what I would have expected.

Comments closed

Scaling Limitations with Site Reliability Engineering

Published 2021-10-07 by Kevin Feasel

Tyler Treat argues that the Site Reliability Engineering paradigm doesn’t scale

:We encounter a lot of organizations talking about or attempting to implement SRE as part of our consulting at Real Kinetic. We’ve even discussed and debated ourselves, ad nauseam, how we can apply it at our own product company, Witful. There’s a brief, unassuming section in the SRE book tucked away towards the tail end of chapter 32, “The Evolving SRE Engagement Model.” Between the SLIs and SLOs, the error budgets, alerting, and strategies for handling change management, it’s probably one of the most overlooked parts of the book. It’s also, in my opinion, one of the most important.

Read on for an explanation of this chapter and how it applies to organizations trying to implement SRE.

Comments closed

Data Type Casing in SQL Server

Published 2021-10-07 by Kevin Feasel

Aaron Bertrand finds a case where casing matters:

We all have coding conventions that we have learned and adopted over the years and, trust me, we can be stubborn about them once they’re part of our muscle memory. For a long time, I would always uppercase data type names, like INT, VARCHAR, and DATETIME. Then I came across a scenario where this wasn’t possible anymore: a case-sensitive instance. In a recent post, Solomon Rutzky suggested:
As long as you are working with SQL Server 2008 or newer, all data type names, including sysname, are always case-insensitive, regardless of instance-level or database-level collations.
I have a counter-example that has led me to be much more careful about always matching the case found in sys.types.

Click through for that scenario.

Comments closed

foreach() vs foreachPartition() in Spark

Published 2021-10-06 by Kevin Feasel

The Hadoop in Real World team contrasts two functions:

foreach() and foreachPartition() are action function and not transform function. Both functions, since they are actions, they don’t return a RDD back.

Read on for the big difference between the two.

Comments closed

pyspark.pandas in Apache Spark 3.2

Published 2021-10-06 by Kevin Feasel

Hyukjin Kwon and Xinrong Meng announce a built-in pandas API for Apache Spark 3.2:

We’re thrilled to announce the pandas API as part of the upcoming Apache Spark™ 3.2 release. pandas is a powerful, flexible library and has grown rapidly to become one of the standard data science libraries. Now pandas users can leverage the pandas API on their existing Spark clusters.
A few years ago, we launched Koalas, an open source project that implements the pandas DataFrame API on top of Spark, which became widely adopted among data scientists. Recently, Koalas was officially merged into PySpark by SPIP: Support pandas API layer on PySpark as part of Project Zen (see also Project Zen: Making Data Science Easier in PySpark from Data + AI Summit 2021).
pandas users can now scale their workloads with one simple line change in the upcoming Spark 3.2 release:

Click through to see more details on the change.

Comments closed

Write-Debug in Powershell

Published 2021-10-06 by Kevin Feasel

Robert Cain goes from verbose to debug mode:

In my previous post, Fun With PowerShell Write-Verbose, I introduced the use of the built in -Verbose switch. In this post we’ll dive into its counterpart, the -Debug switch and its companion Write-Debug cmdlet.
In covering Write-Verbose, I mentioned verbose messages are typically targeted at the average user. The -Debug messages are meant to target the developer. These messages are meant to assist the PowerShell developer in trouble shooting and debugging their code.

Click through for examples of it in action.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Curated SQL Posts