Press "Enter" to skip to content

Author: Kevin Feasel

Basics Of Spark

Jen Underwood gives a quick explanation of Spark as well as an introduction to SparkSQL and PySpark:

Spark’s distributed data-sharing concept is called “Resilient Distributed Datasets,” or RDD. RDDs are fault-tolerant collections of objects partitioned across a cluster that can be queried in parallel and used in a variety of workload types. RDDs are created by applying operations called “transformations” with map, filter, and groupBy clauses. They can persist in memory for rapid reuse. If an RDD data does not fit in memory, Spark will overflow it to disk.

If you’re not familiar with Spark, now’s as good a time as any to learn.

Comments closed

Building Metadata With Biml

Ben Weissman provides a set of Biml scripts to load a metadata table based off of existing tables and columns:

For each member of that collection, we follow some simple rules:

– Our table’s original name is the name of the table in the staging area without our connectionname prefix
– If our tablename still includes an underscore, we will split the name and assign the table- and schemaname respectively. Otherwise, our schema will be DBO.
– Create a DELETE statement towards our metadata store
– Create an INSERT statement towards our metadata store

Admittedly, I would have seen this as a one-time process and would have just written some scripts against sys.tables and sys.columns to generate this metadata, but “one-time processes” tend to happen over and over.

Comments closed

Alert On SQL Jobs Missing Schedules

Brian Hansen wraps up a three-part series on scheduled job alerts:

The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.

To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.

Regardless of how you do it, knowing when jobs fail is important enough to build some infrastructure around answering this question.

Comments closed

CISL 1.4.0

Niko Neugebauer has released the latest version of his Columnstore Index Scripts Library:

Another happy release of the CISL (Columnstore Indexes Script Library) is live – this time it is 1.4.0!

This release is focusing on the addition of the Extended Events, so that a user of CISL can easily set up the events for each of the SQL Server (2012,2014,2016) or Azure SQL Database versions.

This is an open source library which I recommend if you deal with columnstore indexes in any fashion.

Comments closed

Always Encrypted And Memory-Optimized Tables

Joey D’Antoni tests whether Always Encrypted works on memory-optimized tables in SQL Server 2016:

Last week was the PASS Summit, which is the biggest confab of SQL Server professionals on the planet (and educational as ever), Denny Cherry  (b|t) and I ran into Bob Ward (b|t) of Microsoft and of 500 level internals presentations. And for the first time ever, Bob asked us a question about SQL Server—of course we didn’t know the answer of the top of our heads, but we felt obligated to research it like we’ve made Bob do so many times. Anyone, the question came up a Bob’s internals session on Hekaton (In-Memory OLTP) and whether it supported the new Always Encrypted feature in SQL Server 2016. I checked books online, but could not find a clear answer, so I fired up SSMS and setup a quick demo.

Click through for scripts and the answer.

Comments closed

The Halloween Problem

Kenneth Fisher explains the Halloween Problem:

What is The Halloween Problem?
This is a bit more complicated. Let’s say you are trying to give a 10% raise to everyone who makes less than $25k.

Couple of quick notes here. This is a common example because this in fact the problem that exposed the issue. Also, while UPDATEs are probably the easiest way to explain what’s going on, it can affect any type of write.

So back to our update statement. There are several ways this could be implemented. I’m going to use pseudo T-SQL to demonstrate a couple and explain each.

This has certain implications as you can see in the linked Paul White series.  These implications typically mean slower performance (e.g., by forcing spooling) but getting rid of a potentially nasty problem.

Comments closed

On-Prem Power BI

Koen Verbeeck looks at the preview of Power BI integration inside Reporting Services:

  • one thing that I am missing, is when you are rendering the report that there is an “edit report” button that takes you to Power BI Desktop. A bit like in PowerBI.com, where you can also go to edit mode if you have the correct permissions.

  • by the way, if you truly want to test it locally, you can download the .vhd file (the virtual hard disk) and run it in your own HyperV environment.

All in all it looks very nice for a first preview. Currently only SSAS is supported and custom visualizations are not, but I guess the SSRS team will surprise us with more features soon. Great job SSRS team!

Lots of interesting thoughts here, so check it out.

Comments closed

UDL Files To Test Connectivity

Marek Masko shows how to test a database connection without having any database tools:

UDL extension stands for Universal Data Link. These files are used by Data Link API which exposes a user interface to create and manage OLE DB connections. This functionality was introduced in Windows OS at least in Windows 95, maybe even earlier. That means you can use it on every Windows machine you work on. You no longer need to worry about additional tools.

Sometimes you need a creative solution to a policy-induced problem.

Comments closed

Cached Azure Analysis Services Logins

Chris Webb shows how to log into Azure Analysis Services from Management Studio as a different user:

When Azure Analysis Services was announced I had to try it out right away. Of course I didn’t read the instructions properly so when I tried to log in to my Azure Analysis Services instance from SQL Server Management Studio, like an idiot I logged in with the wrong username. The problem is that once you’ve done this, with current versions of SQL Server Management Studio there’s no way of logging out and logging in as a different user. Luckily Igor Uzhviev of Microsoft had a solution for me and I thought I’d share it for anyone else who’s made the same mistake. Here’s what you need to do:

This seems a bit much, but should just be a temporary workaround.

Comments closed

Benford’s Law

Tomaz Kastrun is starting a series on fraud analysis and starts with Benford’s Law:

One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numbers and can be applied across any kind of problem. By naturally occurring, it is meant a number that is not generated generically such as a page number in a book, incremented number in your SQL Table, sequence number of any kind, but numbers that are occurring irrespective from each other, in nature (length or width of trees, mountains, rivers), length of the roads in the cities, addresses in your home town, city/country populations, etc. The law calculates the log distribution of numbers from 1 to 9 and stipulates that number one will occur 30% of times, number two will occur 17% of time, number three will occur 12% of the time and so on. Randomly generated numbers will most certainly generate distribution for each number from 1 to 9 with probability of 1/9. It might also not work with restrictions; for example height expressed in inches will surely not produce Benford function. My height is 188 which is 74 inches or 6ft2. All three numbers will not generate correct distribution, even though height is natural phenomena.

Tomaz includes SQL Server R Services code, so check it out.

Comments closed