Press "Enter" to skip to content

Author: Kevin Feasel

From JSON to SQL Server

Phil Factor has some helper functions for us when working with JSON data:

If you know the structure and contents of a JSON document, then it is possible to turn this into one or more relational tables, but even then I dare to you claim that it is easy to tap in a good OpenJSON SELECT statement to do it. If you don’t know what’s in that JSON file, then you’re faced with sweating over a text editor trying to work it all out. You long to just get the contents into a relational table and take it on from there. Even then, You’ve got several struggles before that table appears in the result pane. You must get the path to the tabular data correct, you have to work out the SQL Datatypes, and you need to list the full panoply of keys. Let’s face it: it is a chore. Hopefully, all that is in the past with these helper functions.

Click through for those functions.

Comments closed

Automating Hadoop Workflows with Spark and Oozie

Prashanth Jayaram walks us through automating a sample data transfer with tools like Sqoop, Spark, and Oozie:

In the process of building a data product one would end-up applying many resource-intensive analytical operations on a medium to large data-set in an efficient way. Apache Spark is the bet in this scenario to perform faster job execution by caching data in memory and enabling parallelism in a distributed data environments.

Components involved in Spark implementation:

1. Initialize spark session using scala program
2. Ingest data from data lake through hive queries
3. Apply business logic using scala constructs or hive queries
4. Load data into HDFS or Hive targets
5. Execute spark programs through spark submit

Read on for a sample flow.

Comments closed

DBCC CHECKDB on Large Databases

Aaron Bertrand shares some thoughts on CHECKDB:

We have a lot of data. Some of that data is stored in large databases (dozens of terabytes each). In some shops, this is an excuse to not run integrity checks. We are not one of those shops.

But we don’t run full CHECKDB operations in production; we have a set of servers dedicated to testing our restores and running checks. We follow a lot of the guidance in these articles:

CHECKDB From Every Angle: Consistency Checking Options for a VLDB

Minimizing the impact of DBCC CHECKDB : DOs and DON’Ts

Minimize performance impact of SQL Server DBCC CHECKDB

Read the whole thing, even if you aren’t dealing with 30+ TB databases.

Comments closed

Creating Users in Azure SQL Database

Kenneth Fisher takes us through a nuance in adding users to Azure SQL Database:

Awesome! I did say I preferred code didn’t I? I am noticing a slight problem though. I don’t actually have a login yet. So I look in object explorer and there is no instance level security tab. On top of that when I try to create a login with code I get the following error:

Msg 5001, Level 16, State 2, Line 1
User must be in the master database.

Read on for the whole process.

Comments closed

In-Memory OLTP and HammerDB Setup Error

Erik Darling tracks down an error:

This a short post about a sort of quirky error message I got while trying to run the TPC-C load test against SQL Server’s in-memory tables.

Error in Virtual User1: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The integer value XXXXXXXXXXXX is out of range.

Click through for the solution. Also, read down to the bottom of the post for a huge discount on Erik’s recorded training content. Erik is extremely knowledgeable and has a great way of explaining things, so take advantage of that knowledge.

Comments closed

GIS Capabilities in R

Lionel Hertzog shows off spatial capabilities in R:

All of these operations follow the same logic, st_operation(A, B) checks for each combinations of the geometries in A and B whether A operation B is true or false. For instance st_within(A, B) checks whether the geometries in A are within B, this is similar to st_contains(B, A), the difference between the two being the shape of the returned object. If A has n geometries and B has m, st_contains(B, A) returns a list of length m where each elements contains the row IDs (numbers between 1 and n) of the geometries in A satisfying the operation. By using sparse=FALSE the functions returns matrices, like st_within(A, B, sparse=FALSE) returns a n x m matrix, st_within(B, A, sparse=FALSE) returns a m x n matrix. Note that running st_operation(A, A) checks the operation between all geometries of the object, so returning a n x n matrix.

Click through for part 1 of the series.

Comments closed

A Primer on Multi-Server SQL Agent Administration

Mikey Bronowski gives us a glimpse at the power of MSX/TSX:

The MSX (master server) can be used to define SQL Agent jobs that then will be pushed over to one or more TSX (target servers). It is possible to see the server is either MSX or TSX when looking at SQL Agent in SQL Server Management Studio Object Explorer. The master server can be identified by (MSX) next to the SQL Server Agent. It will also have two subfolders under the Job folder: Local Jobs (for regular jobs) and Multi-Server Jobs.

This is something woefully few database administrators take advantage of. As the number of servers you have to mange increases, ensuring consistency in SQL Agent jobs becomes more and more difficult. MSX/TSX admittedly has some rough edges, but the tool does a lot.

Comments closed

Statistics Aren’t Guarantees

Brent Ozar lays out an important point:

You might say, “But SQL Server has statistics on those columns, and it knows what the top values are!” Well, that’s true, but…data can change without the statistics being updated. For example, say that one user logs in right now, and then we run the MAX query again:

Statistics tell the engine what they learned at the time they were ran. If you need guarantees, that’s what constraints are for.

Comments closed