Press "Enter" to skip to content

Curated SQL Posts

Multiple Data Sets And SQL Server R Services

Robert Sheldon has a workaround for SQL Server R Services’s limitation of a single input data set:

Despite the ease with which you can run an R script, the sp_execute_external_script stored procedure has an important limitation. You can specify only one T-SQL query when calling the procedure. Of course, you can create a query that joins multiple tables, but this approach might not always work in your circumstances or might not be appropriate for the analytics you’re trying to perform. Fortunately, you can retrieve additional data directly within the R script.

In this article, we look at how to import data from a SQL Server table and from a .csv file. We also cover how to save data to a .csv file as well as insert that data into a SQL Server table. Being able to incorporate additional data sets or save data in different formats provides us with a great deal of flexibility when working with R Services and allows us to take even greater advantage of the many elements available to the R language for data analytics.

Another option is using the rodbc package to connect back to SQL Server to retrieve more data.

Comments closed

Using Tokens In SQL Agent Jobs

Raul Gonzalez builds a SQL Agent job and uses the JOBID token to help him log step output effectively:

It’s usually a good idea to write the output of your SQL Agent jobs to a file, so you can investigate should any issue occur.

But when you define the output file, you need to choose between Appending the output to the same file over and over, or to overwrite it, but that defeats the purpose IMHO.

On the other hand, if you forget to roll over the files, they can grow quite large and then finding any error can become a nightmare.

So some time ago, I wrote a stored procedure that rolls the files for me and place them sorted so it’s easy to find any particular date.

This is a clever solution, but read through to the bottom for a warning.

Comments closed

Finding Queries In Need Of Indexing

Jeff Schwartz continues his series on index tuning:

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused.

Jeff walks through some of his data collection and analysis process in this post, making it worth a read.

Comments closed

Security Issue In Oracle Identity Manager

Oracle has a security advisory with a CVSS base score of 10.0 (which is pretty awful):

This Security Alert addresses CVE-2017-10151, a vulnerability affecting Oracle Identity Manager. This vulnerability has a CVSS v3 base score of 10.0, and can result in complete compromise of Oracle Identity Manager via an unauthenticated network attack. The Patch Availability Document referenced below provides a full workaround for this vulnerability, and will be updated when patches in addition to the workaround are available.

Due to the severity of this vulnerability, Oracle strongly recommends that customers apply the updates provided by this Security Alert without delay.

Catalin Cimpanu explains:

The affected product is Oracle Identity Manager (OIM), a user management solution that allows enterprises to control what parts of their network employees can access. OIM is part of Oracle’s highly popular Fusion Middleware offering and is one of its most used components.

Oracle describes the issue — tracked under the CVE-2017-10151 identifier — as a “default account” vulnerability, an umbrella term that’s usually used to describe accounts with no password or hardcoded credentials (a.k.a. backdoor accounts).

“This vulnerability is remotely exploitable without authentication, i.e., may be exploited over a network without requiring user credentials,” Oracle said in a security alert.

Oracle has patched this.  If you have it installed, please update ASAP.

Comments closed

Azure SQL DB Automatic Tuning FAQ

Arun Sirpal has a self-Q&A session regarding Azure SQL Database’s automatic tuning options:

What are the options?

  1. CREATE INDEX that identifies the indexes that may improve performance of your workload, creates the indexes, and verifies that they improve performance of the queries.

  2. DROP INDEX that identifies redundant and duplicate indexes, and indexes that were not used in the long period of time.

  3. PLAN REGRESSION CORRECTION that identifies SQL queries that are using execution plan that are slower than previous good plan, and uses the last known good plan instead of the regressed plan.

Very useful information.

Comments closed

Hortonworks Data Platform 2.6.3 Released

Roni Fontaine, et al, announce Hortonworks Data Platform 2.6.3:

First, in this release, we made huge strides with regards to Data Science.  This includes:

  • Apache Spark 2.2 is GA. This enhances the stability of Spark for structured streaming.

  • Apache Zeppelin 0.7.3 is GA. This provides support for notebooks stored in HDFS, and improves the scalability and multi-user support.

  • Apache Livy 0.4.0 is GA. This version supports multiple programming languages in the same Spark context.

This is a point release, so most of it is around adding updated versions of different products.

Comments closed

Thinking About Slowly Degrading Page Performance

Ritesh Maheshwari talks about how LinkedIn deals with performance regressions:

Looking at the chart above, where the dotted red line is a reference point to show where we started the year, notice how site speed improvements tend to be significant and noticeable, as they are optimization-driven. Degradations, however, can generally be of any “amount,” as they happen for various reasons. LinkedIn’s page-serving pipeline has many moving parts. We deploy code multiple times per day, operate a micro-service architecture with hundreds of services, and infrastructure upgrades are frequent. A slowdown in any of these components can cause degradations.

While large degradations can be caught using A/B testingcanary analysis, or anomaly detection, small ones tend to leak to production. Thus, performance of a page has a tendency to always degrade over time.

This led to having the centralized Performance Team focus on identifying these leaks, called “site speed regressions,” and to craft tools and processes to fix them.

It’s an interesting principle.  I could see this principle work for tracking database performance degradation as well.

Comments closed

Promises And Closures In R

Damian Rodziewicz looks at the new promises package in R:

Citing Joe Cheng, our aim is to:

  1. Execute long-running code asynchronously on separate thread.
  2. Be able to do something with the result (if success) or error (if failure), when the task completes, back on the main R thread.

A promise object represents the eventual result of an async task. A promise is an R6 object that knows:

  1. Whether the task is running, succeeded, or failed

  2. The result (if succeeded) or error (if failed)

This looks pretty exciting.  H/T R-Bloggers

Also, Sebastian Warnholz has a post on promises and closures in case you’re not familiar with the concepts:

Every argument you pass to a function is a promise until the moment R evaluates it. Consider a function g with arguments x and y. Let’s leave out one argument in the function call:

g <- function(x, y) x
g(1)

## [1] 1

R will be forgiving (lazy) until the argument y is actually needed. Until then y exists in the environment of the function call as a ‘name without a value’. Only when R needs to evaluate y a value is searched for. This means that we can pass some non-existent objects as arguments to the function g and R won’t care until the argument is needed in the functions body.

Read the whole thing.  Once again, H/T R-Bloggers

Comments closed

Team Data Science Process Updates

David Smith announces updates to the Team Data Science Process:

It’s been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including:

  • The IDEAR (Interactive Data Exploration, Analysis and Reporting) framework, an open source extension to R and Python designed to standardize the process of data exploration and reporting;

  • Guidance for use of Spark 2.0, including an end-to-end Spark v2.0 walkthrough;

  • Guidance for use of in-database Python with SQL Server, including an end-to-end in-database Python tutorial;

Click through for more changes, as well as links to further resources.

Comments closed

Python For The DBA: Copying SQL Logins

David Fowler has an example of how DBAs can use Python to do something interesting:

There are plenty of times when you might want to copy your SQL logins (including the SID) from one server to another.  Perhaps you’re running an AG and need to make sure that all users exist on all your secondaries with the correct SID, perhaps you’re migrating servers and need all the logins on your new server.  Whatever the reason, there are a number of different ways in which you can do this but they usually require scripting out on one server and then running the script into another server, or of course there’s Powershell.

The below script will use Python to copy all or specified logins from one server to another, including the password and SID.

Click through for the script.  The main use case for a SQL Server DBA to learn Python as a DBA scripting language would be if you run SQL on Linux—I don’t think Powershell on Linux is far enough developed to handle the full range of DBA tasks.  Otherwise, I’d use Powershell and dbatools.

Comments closed