Press "Enter" to skip to content

Month: June 2017

Conditional Job Retry

Chris Bell has a procedure which conditionally retries a failed SQL Agent job from a pre-determined step:

When the job fails, and the alert message compiled, this procedure gets called and the job name, step name, a delay value are passed to it. There is also a retry flag that comes back fro this procedure.

The first thing this procedure does is go and find the last failed step for the particular job. It then counts and based on the @retry value verifies if a retry job has already been created. This is in case some other process tries to do this same thing and should help prevent too many retries from firing off.
If a retry job does not exist, this process creates a new disposable job that will rerun the original from the beginning or the step that failed based on the checking for “Level 1” or “Level 2” in the job name. The job is prefixed with ‘Retry -‘ so it can be found easily in your server’s job list.
If a delay is specified, 2 minutes in this example, then it calculates a new run time for the retry job and finally creates the job.

This helps make SQL Agent jobs a little more robust.

Comments closed

Bundling Measures Together

Philip Seamark shows how to bundle measures together in Power BI so they all appear at the top of the Fields section:

I’m going to share in this blog a technique I’ve found useful in Power BI for collecting measures together in once place AND placing them at the top of the field list.

The good news is, calculated measures do not have to exist on the table that stores the underlying data specific to that measure.  Measures can be placed on any table in the model and they will still work as expected.  This may not be immediately obvious but it’s handy to know.

So far, I’ve kept measures on their logical best-fit tables, but Philip’s hint looks quite useful once the set of measures grows, or if there are a number of cross-table measures.

Comments closed

Separating Data And Log Files

Brent Ozar looks at an old chestnut:

So it’s time for a quiz:

  1. If you put all of a SQL Server’s data files & logs on a single volume, how many failures will that server experience per year?
    • Bonus question: what kinds of data loss and downtime will each of those failure(s) have?
  2. If you split a SQL Server’s data files onto one volume, and log files onto another volume, how many failures will that server experience per year?
    • Bonus question: what kinds of data loss and downtime will each of those failures have?

Think carefully about the answers – or read the comments to see someone else’s homework, hahaha – before you move on.

With SANs, this advice is not even that good on the performance side—especially with modern SANs which don’t let you dedicate spindles.  It definitely doesn’t fly on the reliability side.

Comments closed

Columnstore Dictionaries

Niko Neugebauer explains some interesting facts about columnstore index dictionaries:

From a recent experience at a customer, I had an opportunity to dive into the details of the Columnstore Indexes Dictionaries. I have to admit that my understanding of them was pretty low, from what I have learned in the recent days, and I would like to share what I have learned with everyone.

These are some of the key findings that I have discovered:
– The local dictionaries are not exclusively connected with just 1 Row Group, but with multiple ones;
– The dictionaries within Columnstore Indexes are compressed in a different way, depending on the type of the compression applied (Columnstore vs Columnstore Archival);

and let us dive into each one of them:

Read the whole thing.

Comments closed

Stopping SQL Injection

Wayne Sheffield has a post explaining what SQL injection is and discussing how to stop it:

Me: Umm, boss… Does this report allow users to enter in search criteria?

Boss: But of course!

Me: Well, I really hate to tell you this, but we have a SQL Injection problem.

And after a bit of back and forth where the developers were insisting that no way was there a SQL Injection problem, I sat down with the dev team lead and the boss and proved it to them. We created a dummy table in the database, went to the report criteria form, and I dropped the table.

Wayne: +1000

Development Team: -1000

Injection attacks are still the most common form of attack out there.  Sadly.

Comments closed

Kafka Offset Management With Spark Streaming

Guru Medasana and Jordan Hambleton explain how to perform Kafka offset management when using Spark Streaming:

Enabling Spark Streaming’s checkpoint is the simplest method for storing offsets, as it is readily available within Spark’s framework. Streaming checkpoints are purposely designed to save the state of the application, in our case to HDFS, so that it can be recovered upon failure.

Checkpointing the Kafka Stream will cause the offset ranges to be stored in the checkpoint. If there is a failure, the Spark Streaming application can begin reading the messages from the checkpoint offset ranges. However, Spark Streaming checkpoints are not recoverable across applications or Spark upgrades and hence not very reliable, especially if you are using this mechanism for a critical production application. We do not recommend managing offsets via Spark checkpoints.

The authors give several options, so check it out and pick the one that works best for you.

Comments closed

Updates In Apache Kafka

Yeva Byzek announces that Apache Kafka 0.11.0.0 is shipping soon:

We are very excited for the GA for Kafka release 0.11.0.0 which is just days away. This release is bringing many new features as described in the previous Log Compaction blog post.

The most notable new feature is Exactly Once Semantics (EOS).  Kafka’s EOS capabilities provide more stringent idempotent producer semantics with exactly once, in-order delivery per partition, and stronger transactional guarantees with atomic writes across multiple partitions. Together, these strong semantics make writing applications easier and expand Kafka’s addressable use cases. You can learn more about EOS in the online talk on June 29, 2017.

“Exactly once,” if done right, would be crazy—there’s a reason most brokers are either “at least once” or “best effort.”

Comments closed

Spark And H2O

Avkash Chauhan shows how to use sparklyr and rsparkling to tie Spark together with the H2O library in R:

In order to work with Spark H2O using rsparkling and sparklyr in R, you must first ensure that you have both sparklyr and rsparkling installed.

Once you’ve done that, you can check out the working script, the code for testing the Spark context, and the code for launching H2O Flow. All of this information can be found below.

It’s a short post, but it does show how to kick off a job.

Comments closed

Securing S3 Credentials In Spark Jobs

Jason Pohl shows how to protect credentials for connecting to Amazon Web Services S3 buckets when building Spark jobs:

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those credentials in the clear can be a security risk if not stringently administered. To mitigate that risk, Databricks makes it easy and secure to connect to S3 with either Access Keys via DBFS or by using IAM Roles. For all other data sources (Kafka, Cassandra, RDBMS, etc.), the sensitive credentials must be managed by some other means.

This blog post will describe how to leverage an IAM Role to map to any set of credentials. It will leverage the AWS’s Key Management Service (KMS) to encrypt and decrypt the credentials so that your credentials are never in the clear at rest or in flight. When a Databricks Cluster is created using the IAM Role, it will have privileges to both read the encrypted credentials from an S3 bucket and decrypt the ciphertext with a KMS key.

That’s only one data source, but an important one.

Comments closed

Power BI Supports Interactive R Visuals

David Smith reports on a great update to Power BI:

The above chart was created with the plotly package, but you can also use htmlwidgets or any other R package that creates interactive graphics. The only restriction is that the output must be HTML, which can then be embedded into the Power BI dashboard or report. You can also publish reports including these interactive charts to the online Power BI service to share with others. (In this case though, you’re restricted to those R packages supported in Power BI online.)

Power BI now provides four custom interactive R charts, available as add-ins:

I’d avoided doing too much with R visuals in Power BI because the output was so discordant—Power BI dashboards are often lively things, but the R visual would just sit there, limp and lifeless.  I’m glad to see that this has changed.

Comments closed