Press "Enter" to skip to content

Author: Kevin Feasel

MariaDB On Azure Now In Preview

Chris Seferlis fills us in on Microsoft’s latest Platform-as-a-Service database offering:

First, a little history on MariaDB. MariaDB is a community developed fork of the MySQL. Essentially, when Oracle purchased MySQL from Sun, some of the developers from MySQL were concerned that the acquisition would make changes or lead down the road where it would no longer be open source.

So, they went ahead and forked off to MariaDB with the intent to maintain high compatibility with MySQL. Also, the contributors are required to share their copyright with MariaDB foundation rights, which in a nutshell means they want this foundation to always be open source.

This joins SQL Server, MySQL, and Postgres as PaaS database offerings in Azure.

Comments closed

Scary Scenarios For Database Developers

Bert Wagner gets into the Halloween spirit:

You’ve spent all morning loading millions of records into your tables.  Some of that data was painstakingly loaded by hand from flat files, and some lookup values you even authored manually because “you’ll only have to do this once.”

You are jumping around between SSMS windows, running this query here and that query there, pressing F5 like there’s no tomorrow. And then right before you press to F5 to finish all of your work…

…you realize that you ran the whole tab worth of code instead of the one statement you meant to highlight.  “Commands completed successfully” flashes in your Message window, and in horror you discover you accidentally ran a truncate table statement!

Ok so this is pretty bad.

Not that I’ve done that before…nope…

Comments closed

Signing Certificates For Multi-Database Access

David Fowler shows you how to create a signed certificate which allows for cross-database access:

Recently Manish Kumar asked an interesting question, what do you do if your proc accesses multiple or even all the databases on the server?

So, instead of giving him a fuzzy answer in reply, I thought I’d write up exactly how you can deal with that sort of situation.

We’ve got two options and we’ll have a look at both of them (I’m not going to go into details about how signing procs works, please see the post mentioned earlier for a basic overview, here I’m going to look specifically at procs that access multiple databases).

Click through to see both solutions.

Comments closed

Permission Set In A Post-SQL 2017 CLR World

Solomon Rutzky investigates what the PERMISSION_SET property does as of SQL Server 2017:

And farther down on that same page, there is a note stating:

 Important
The PERMISSION_SET option is affected by the clr strict security option, described in the opening warning. When clr strict security is enabled, all assemblies are treated as UNSAFE.

The last sentence in each of those quoted statements certainly does seem to be saying that when “CLR strict security” is enabled, then UNSAFE is effectively the only PERMISSION_SET, or stated another way: then PERMISSION_SET effectively no longer has any effect. However, as usual, things are perhaps not so simple given that the wording is not explicitly stating that. It would be rather easy to add a few words to both statements to indicate that the PERMISSION_SET property is now being ignored. Other documentation makes similar claims (for example: allow updates Server Configuration Option). So, did they fail to explicitly state this, or was this not stated because it is not what is actually happening?

Read on for Solomon’s answer.

Comments closed

Using Hive Hooks

Pushkar Gujar shows us how to use Hive hooks, which behave a bit like triggers in relational databases:

To understand how data is consumed, we need to figure out answers to some basic questions like:

  • Which datasets (tables/views/DBs) are accessed frequently?
  • When are the queries run most frequently?
  • Which users or applications are heavily utilizing the resources?
  • What type of queries are running frequently?

The most accessed object can easily benefit from optimization like compression, columnar file format, or data decomposition. A separate queue can be assigned to heavy-resource-utilizing apps or users to balance the load on a cluster. Cluster resources can be scaled up during the timeframe when a large number of queries are mostly run to meet SLAs and scaled down during low usage tide to save cost.

Hive Hooks are convenient ways to answer some of the above questions and more!

Read on to learn how.

Comments closed

Untangling Kafka APIs

Stephane Maarek helps us make sense of when to use which Kafka API:

I identify 5 types of workloads in Apache Kafka, and in my opinion each corresponds to a specific API:

  • Kafka Producer API: Applications directly producing data (ex: clickstream, logs, IoT).

  • Kafka Connect Source API: Applications bridging between a datastore we don’t control and Kafka (ex: CDC, Postgres, MongoDB, Twitter, REST API).

  • Kafka Streams API / KSQL: Applications wanting to consume from Kafka and produce back into Kafka, also called stream processing. Use KSQL if you think you can write your real-time job as SQL-like, use Kafka Streams API if you think you’re going to need to write complex logic for your job.

  • Kafka Consumer API: Read a stream and perform real-time actions on it (e.g. send email…)

  • Kafka Connect Sink API: Read a stream and store it into a target store (ex: Kafka to S3, Kafka to HDFS, Kafka to PostgreSQL, Kafka to MongoDB, etc.)

Stephane then goes into detail on each of these.

Comments closed

Deciding Whether To Clean Up Temp Tables

Grant Fritchey looks at what difference explicitly dropping temporary tables in a procedure makes:

I then set up Extended Events to capture the query metrics and I executed each of the queries multiple times (also, just for the test, I discarded the results because I didn’t want that process mucking with my measurements). After executing both procedures 500 times, the results were quite simple. The average execution time with an explicit drop was 8,672 microseconds. Meanwhile, the average for not dropping the temporary table was 8,530 microseconds. That’s about a 1% difference across hundreds of executions. Reads were identical and so were writes.

In short, the behavior is the same.

What about the impact on the system? Could I see changes in memory or I/O as these different processes ran?

Grant didn’t notice any difference but check Allen White and Jay Robinson’s answers in the comments.  Temp table reuse can happen (if you follow the rules) and can make a difference when a procedure is called frequently enough.

Comments closed

Using IO Cache To Speed Up Spark Jobs

Chris Seferlis looks at what the HDInsight team has done to speed up Apache Spark jobs:

The big news here is the recently released preview of HDInsight IO Cache, which is a new transparent data caching feature that provides customers with up to 9X performance improvement for Spark jobs, without an increase in costs.

There are many open source caching products that exist in the ecosystem: Alluxio, Ignite, and RubiX to name a few big ones. The IO Cache is also based on RubiX and what differentiates RubiX from other comparable caching products is its approach of using SSD and eliminating the need for explicit memory management. While other comparable caching products leverage the reservation of operating memory for caching the data.

Read on for more details.

Comments closed

Power BI: Choosing Between Default Or Custom Date Dimensions

Reza Rad looks at the default date dimension available in Power BI and compares it to what you can build yourself:

Oh yes! Of course like any other table structures in the Power BI in-memory based structure, every date table, consumes memory. But it would do the same even if you create your custom date dimension! Whenever you do the role-playing dimension scenario you are also consuming even more memory! The main difference is that Power BI default Date dimension will be created even if you do not want to do the date-based analysis on a date field! For example, even if you don’t use DueDate in your date-based analysis, still Power BI creates a date dimension for it. You cannot stop it for one field. You have to either stop the default creation of the Date dimension for the entire model or use it for the entire model, you cannot customize it per field. But with the custom date dimension, you can.

Reza does a good job of giving us the trade-offs between these two and explaining when we should use each.

Comments closed

Building Custom R Visuals In Power BI

Brad Lewellyn shows us how to create custom R visuals within Power BI:

Over the last few posts, we’ve shown how to use custom R visuals built by others.  Today, we’re going to build our own using the Custom R Visual available in Power BI Desktop.  If you haven’t read the second post in this series, Getting Started with R Scripts, it is highly recommended you do so now, as it provides necessary context for how to link Power BI to your local R ISE.

In the previous post, we created a bunch of log-transformed measures to find good predictors for Revenue.  We’re going to use these same measures today to create a basic linear regression model to predict Revenue.  If you want to follow along, the dataset can be found here.  Here’s the custom DAX we used to create the necessary measures.

Click through for the example.

Comments closed