Curated SQL – Page 1289 – A Fine Slice Of SQL Server

“Server Is Configured For Windows Authentication Only” Error

Published 2018-06-19 by Kevin Feasel

Kenneth Fisher diagnoses a misleading error:

In general, the errors SQL gives are highly useful. Of course every now and again you get one that’s just confounding. The other day I saw the following error in the log:

Login failed for user ”. Reason: An attempt to login using SQL authentication failed. Server is configured for Windows authentication only. [CLIENT: ]

This one confused me for a couple of reasons. First, the user ”. Why an empty user? That’s not really helpful. And second Server is configured for Windows authentication only.

But Kenneth shows that the server is configured for SQL authentication as well as Windows authentication. Click through to see what gives.

Comments closed

Non-SARGable Predicates And Computed Columns

Published 2018-06-19 by Kevin Feasel

Erik Darling shows that you can create a computed, indexed column to make a non-SARGable predicate perform a seek operation:

Before I show you what I mean, we should probably define what’s not SARGable in general.

Wrapping columns in functions: ISNULL, COALESCE, LEFT, RIGHT, YEAR, etc.

Evaluating predicates against things indexes don’t track: DATEDIFF(YEAR, a_col, b_col), a_col +b_col, etc.

Optional predicates: a_col = @a_variable or @a_variable IS NULL

Applying some expression to a column: a_col * 1000 < some_value

Applying predicates like this show that you don’t predi-care.

They will result in the “bad” kind of index scans that read the entire index, often poor cardinality estimates, and a bunch of other stuff — sometimes a filter operator if the predicate can’t be pushed down to the index access level of the plan.

Read on for an example.

Comments closed

Trying To Force A Plan For A Different Query With Query Store

Published 2018-06-19 by Kevin Feasel

Erin Stellato shows us that you cannot use a plan generated for one query as a forced plan for a different query in Query Store:

This is question I’ve gotten a few times in class…Can you force a plan for a different query with Query Store?

tl;dr

No.

Assume you have two similar queries, but they have different query_id values in Query Store. One of the queries has a plan that’s stable, and I want to force that plan for the other query. Query Store provides no ability to do this in the UI, but you can try it with the stored procedure. Let’s take a look…

I can see the potential benefit, but the downside risk is huge, so it makes sense not to allow this.

Comments closed

Metacat: Federated Metadata Discovery

Published 2018-06-18 by Kevin Feasel

Ajoy Majumdar and Zhen Li walk us through Metacat:

The core architecture of the big data platform at Netflix involves three key services. These are the execution service (Genie), the metadata service, and the event service. These ideas are not unique to Netflix, but rather a reflection of the architecture that we felt would be necessary to build a system not only for the present, but for the future scale of our data infrastructure.

Many years back, when we started building the platform, we adopted Pig as our ETL language and Hive as our ad-hoc querying language. Since Pig did not natively have a metadata system, it seemed ideal for us to build one that could interoperate between both.

Thus Metacat was born, a system that acts as a federated metadata access layer for all data stores we support. A centralized service that our various compute engines could use to access the different data sets. In general, Metacat serves three main objectives:

Federated views of metadata systems

Unified API for metadata about datasets

Arbitrary business and user metadata storage of datasets

It is worth noting that other companies that have large and distributed data sets also have similar challenges. Apache Atlas, Twitter’s Data Abstraction Layer and Linkedin’s WhereHows (Data Discovery at Linkedin), to name a few, are built to tackle similar problems, but in the context of the respective architectural choices of the companies.

If you’re interested, also check out their GitHub repo.

Comments closed

Understanding A Spark Streaming Workflow

Published 2018-06-18 by Kevin Feasel

Himanshu Gupta continues a series on structured streaming using Spark Streaming:

Here we can clearly see that if new data is pushed to the source, Spark will run the “incremental” query that combines the previous running counts with the new data to compute updated counts. The “Input Table” here is the lines DataFrame which acts as a streaming input for wordCounts DataFrame.

Now, the only unknown thing in the above diagram is “Complete Mode“. It is nothing but one of the 3 output modes available in Structured Streaming. Since they are an important part of Structured Streaming, so, let’s read about them in detail:

Complete Mode – This mode updates the entire Result Table which is eventually written to the sink.
Append Mode – In this mode, only the new rows are appended in the Result Table and eventually sent to the sink.
Update Mode – At last, this mode updates only the rows that are changed in the Result Table since the last trigger. Also, only the new rows are sent to the sink. There is one peculiar thing to note about this mode, i.e., it is different from the Complete Mode in the way that this mode only outputs the rows that have changed since the last trigger. If the query doesn’t contain any aggregations, it is equivalent to the Append mode.

Check it out.

Comments closed

Calculating TF-IDF Using Apache Spark

Published 2018-06-18 by Kevin Feasel

Arseniy Tashoyan shows us how to calculate Term Frequency-Inverse Document Frequency using Apache Spark:

TF-IDF is used in a large variety of applications. Typical use cases include:

Document search.

Document tagging.

Text preprocessing and feature vector engineering for Machine Learning algorithms.

There is a vast number of resources on the web explaining the concept itself and the calculation algorithm. This article does not repeat the information in these other Internet resources, it just illustrates TF-IDF calculation with help of Apache Spark. Emml Asimadi, in his excellent article Understanding TF-IDF, shares an approach based on the old Spark RDD and the Python language. This article, on the other hand, uses the modern Spark SQL API and Scala language.

Although Spark MLlib has an API to calculate TF-IDF, this API is not convenient to learn the concept. MLlib tools are intended to generate feature vectors for ML algorithms. There is no way to figure out the weight for a particular term in a particular document. Well, let’s make it from scratch, this will sharpen our skills.

Read on for the solution. It seems that there tend to be better options today than TF-IDF for natural language problems, but it’s an easy algorithm to understand, so it’s useful as a first go.

Comments closed

File Growth Rate Under 1MB

Published 2018-06-18 by Kevin Feasel

John Morehouse shows that the file growth rate GUI for Management Studio doesn’t report values under 1MB correctly:

While I was recently, doing a review of a client’s environment I discovered that the GUI can lie to you when it comes to the database file growth rates. By default, the data file is set to a 1MB growth rate and the log file is configured for a 10% growth rate. Both are horrible settings for most OLTP environments. However, starting with SQL Server 2016, the default growth rates are configured for 64MB, which in my opinion is better than the previous defaults.

Using the GUI to look at a 2017 Scratch database I have, we can see that the data file is configured for 1MB and the log file is set for 64MB growth.

I don’t think there’s a good reason for a file growth rate under 1 MB at this point. That could have made sense in the late ’90s, but the idea of growing 128KB at a time is funny.

Comments closed

Running ssisUnit In MSTest

Published 2018-06-18 by Kevin Feasel

Bartosz Ratajczyk shows how to convert ssisUnit tests to work with the NUnit or MSTest frameworks:

MSTest v2 is the open source test framework by Microsoft. I will not write a lot about it. If you want to learn more – read the excellent blog posts by Gérald Barré.

I took the idea and parts of the code from Ravi Palihena’s blog post about ssisUnit testing and his GitHub repository. Then I read the source code of the SsisUnitTestRunner, SsisUnitTestRunnerUI and posts by Gérald and changed the tests a bit.

I will use MSTest to execute ssisUnit tests from the file 20_DataFlow.ssisUnit. For that, I created a new Visual C# > Test > Unit Test Project (.NET Framework) – ssisUnitLearning.MSTest – within the solution. I also set the reference to the SsisUnit2017.dll and SsisUnitBase.dll libraries and loaded required namespaces

Bartosz gives us the initial walkthrough, and then builds a T4 template to automate the task. You can grab that template on his GitHub repo, and hopefully something makes its way into ssisUnit to make integration with NUnit / MSTest official.

Comments closed

Disambiguating “App” In Power BI

Published 2018-06-18 by Kevin Feasel

Melissa Coates gives us the different forms of what an “app” is in the Power BI world:

Let’s say you just heard someone mention a Power BI app. What exactly do they mean by that? Well, it depends. The term “app” is used kind of a lot in the Power BI world. So, here’s a quick reference to help you decode the conversation. I’m going to start with the most likely options, working down to other options. Which one someone is referring to really depends on their role and their level of familiarity with the Power BI ecosystem.

Power BI App

A Power BI App is a packaged up set of content in the web-based Power BI Service. Related reports, workbooks, dashboards, and datasets are published from an App Workspace into an App for users to consume.

Power BI App Workspace

An App Workspace in the Power BI Service is where reports, workbooks, dashboards, and datasets are saved, and where data refresh schedules and other settings are defined. An App Workspace is suited to development & collaboration with coworkers (whereas My Workspace is a private area). Smaller teams might do everything they need to do within an App Workspace, whereas larger teams use an App Workspace as the collaboration area for content before it gets published to a Power BI App for consumption. You can have quite a few App Workspaces, depending on how you organize content (for instance, by subject area, by project, by department, or by type of analysis).

Click through for several other potential answers for what that user means by “app.”

Comments closed

Why Your Transaction Log Is Full: LOG_BACKUP

Published 2018-06-18 by Kevin Feasel

Jen McCown explains why you might get the error message “The transaction log for database ‘<your database>’ is full due to ‘LOG_BACKUP'”:

Your transaction log is full. Both Microsoft, and about 100 articles and blogs have covered this topic, but let’s take a quick look anyway. Because, you know, it comes up all the time.

Summary:

This error message points to a lack of log backups.
Make sure using sys.databases.
Start backing up the log.
You can shrink the log if necessary.
A note on SIMPLE mode, and why it’s often a terrible idea.

This is a good summary of the problem and various solutions.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Curated SQL Posts

Power BI App

Power BI App Workspace