Month: April 2018

Scalar Function Blocking

Published 2018-04-27 by Kevin Feasel

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

1

2

3

4

5

6

7

8

9

10

11

12

CREATE OR ALTER FUNCTION dbo.BadIdea ( @uid INT )

RETURNS BIGINT

WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING

AS

    BEGIN

        DECLARE @BCount BIGINT;

        SELECT   @BCount = COUNT_BIG(*)

        FROM     dbo.Badges AS b

        WHERE    b.UserId = @uid

        GROUP BY b.UserId;

        RETURN @BCount;

    END;

Someone had added that function as a computed column to the Users table:

1	ALTER TABLE dbo.Users ADD BadgeCount AS dbo.BadIdea(Id);

Spoilers: this was a bad idea.

Comments closed

Accessing SQL Server From Scala

Published 2018-04-27 by Kevin Feasel

Sidharth Khattri shows how to use Scala Slick, a library designed to integrate with database, to connect to SQL Server:

Now moving onto our FRM (Functional Relational Mapping) and repository setup, the following import will be used for MS SQL Server Slick driver’s API
import slick.jdbc.SQLServerProfile.api._
And thereafter the FRM will look same as the rest of the FRM’s delineated on the official Slick documentation. For the example on this blog let’s use the following table structure
CREATE TABLE user_profiles (
 id         INT IDENTITY (1, 1) PRIMARY KEY,
 first_name VARCHAR(100) NOT NULL,
 last_name  VARCHAR(100) NOT NULL
)
whose functional relational mapping will look like this:
class UserProfiles(tag: Tag) extends Table[UserProfile](tag, "user_profiles") {
 def id: Rep[Int] = column[Int]("id", O.PrimaryKey, O.AutoInc)
 def firstName: Rep[String] = column[String]("first_name")
 def lastName: Rep[String] = column[String]("last_name")
 def * : ProvenShape[UserProfile] = (id, firstName, lastName) <>(UserProfile.tupled, UserProfile.unapply) // scalastyle:ignore
}

I’m definitely going to need to learn more about this.

Comments closed

Powershell: Validating Parameters Using Private Functions

Published 2018-04-27 by Kevin Feasel

Mike Robbins shows how to split out validation from your primary function within Powershell:

They responded by asking if it was possible to move the custom message that Throw returns to the private function. At first, I didn’t think this would be possible, but decided to try the code to make an accurate determination instead of just assuming it wasn’t possible.

I’ve now learned something else which makes the whole process of moving the validation from the ValidateScript block to a private function much more user friendly which is what I think the person who asked the question was trying to accomplish.

If you have several parameters with somewhat complex validation logic, this makes maintenance a lot easier.

Comments closed

Running dbachecks In Parallel

Published 2018-04-27 by Kevin Feasel

Caludio Silva shows how you can run multiple instances of dbachecks concurrently:

Imagine that I want to check for databases in Full Recovery Model on the production environment and I want to start (in parallel) a new check for the development environment where I want to check for Simple Recovery Model if this setting is not changed in the correct time frame, we can end checking for Full Recovery Model on the development environment where we want the Simple Recovery Model.

The first time I tried to run tests for some environments in parallel, that had the need to change some configs, I didn’t realise about this detail so I ended up with much more failed tests than the expected! The bell rang when the majority of the failed tests were from a specific test…the one I had changed the value.

Read the whole thing before you start running Task.Parallel or even running multiple copies of dbachecks in separate Powershell windows.

Comments closed

The Evolution Of Hadoop

Published 2018-04-26 by Kevin Feasel

Holden Ackerman has an interesting analysis of Qubole customers’ adoption of Hadoop 2:

In Qubole’s 2018 Data Activation Report, we did a deep-dive analysis of how companies are adopting and using different big data engines. As part of this research, we found some fascinating details about Hadoop that we will detail in the rest of this blog.

A common misconception in the market is that Hadoop is dying. However, when you hear people refer to this, they often mean “MapReduce” as a standalone resource manager and “HDFS” as being the primary storage component that is dying. Beyond this, Hadoop as a framework is a core base for the entire big data ecosystem (Apache Airflow, Apache Oozie, Apache Hbase, Apache Spark, Apache Storm, Apache Flink, Apache Pig, Apache Hive, Apache NiFi, Apache Kafka, Apache Sqoop…the list goes on).

I clipped this portion rather than the direct analysis because I think it’s an important point: the Hadoop ecosystem is thriving as the matter of primary importance switches from what was important a decade ago (batch processing of large amounts of data on servers with direct attached storage) to what is important today (a combination of batch and streaming processing of large amounts of data on virtualized and often cloud-based servers with network-attached flash storage).

Comments closed

How Spark Works: RDDs And DAGs

Published 2018-04-26 by Kevin Feasel

Shubham Agarwal gets into the way that Spark translates operations on Resilient Distributed Datasets into actions:

When we do a transformation on any RDD, it gives us a new RDD. But it does not start the execution of those transformations. The execution is performed only when an action is performed on the new RDD and gives us a final result.

So once you perform any action on an RDD, Spark context gives your program to the driver.

The driver creates the DAG (directed acyclic graph) or execution plan (job) for your program. Once the DAG is created, the driver divides this DAG into a number of stages. These stages are then divided into smaller tasks and all the tasks are given to the executors for execution.

Click through for more details.

Comments closed

Tracking Down Long-Running xp_cmdshell Processes

Published 2018-04-26 by Kevin Feasel

Thomas Rushton investigates what’s taking so long with an xp_cmdshell call:

I wanted to know what he was up to, but the sql_text field only gives “xp_cmdshell”, not anything useful that might help to identify what went wrong.

So we have to go to Taskmanager on the server. On the “Process Details” page, you can select which detail columns you want to see. We want to see the Command Line, as that’ll tell us if it’s some manually-launched batch job that’s failed or something else going wrong.

An alternative to using the Task Manager is to open ProcMon, part of the Sysinternals toolset. It takes a bit of getting used to, but is quite powerful once you know its ins and outs.

Comments closed

Creating Seaborn Plots With R

Published 2018-04-26 by Kevin Feasel

Abdul Majed Raja shows how to call Python from R and build plots using the Seaborn Python package:

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. The package includes facilities for:

Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.

Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).

Flexible binding to different versions of Python including virtual environments and Conda environments.

Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability.

The more common use of reticulate I’ve seen is running TensorFlow neural networks from R.

Comments closed

Getting The Current Date And Time In SQL Server

Published 2018-04-26 by Kevin Feasel

Randolph West shows a few functions which can retrieve current date and time information:

What do we mean by local date and time?

As discussed previously, SQL Server is not time zone aware, nor does it have to be. This is because the operating system that SQL Server runs on can have multiple custom regional settings depending on which user is logged into the server.

This holds true for the SQL Server service account as well, which is just another user on the operating system. When any of these functions is called, it is asking for the date and time from the operating system.

If you’re going to use DATETIME2 (which you generally should), take advantage of the precision that SYSUTCDATETIME() gives you over GETUTCDATE().

Comments closed

SQL Operations Studio April Release

Published 2018-04-26 by Kevin Feasel

Alan Yu announces the April release of SQL Operations Studio:

Highlights for this build include the following.

Public preview release of SQL Agent extension
Added new extensions and improved existing extensions
- Improvements to Server Reports Extension
- Release of SSMS Keymap extension
- Release of AlwaysOn Insights extension
- Release of MSSQL Instance Insights
- Release of MSSQL Db Insights
Added Visual Studio Code 1.21 platform source code refresh
- Improved large and protected file support for saving Admin protected and >256M files within SQL Ops Studio
- Integrated Terminal splitting to work with multiple open terminals at once
- Reduced installation on-disk file count footprint for faster installs and startup times
Continue to fix GitHub issues

There’s a lot in here.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30