Press "Enter" to skip to content

Day: May 29, 2019

Using Java in SQL Server 2019

Niels Berglund has an update on writing Java code in SQL Server 2019:

In CTP 2.5 and onwards when you write Java code for SQL Server you implement your code using the Microsoft Extensibility SDK for Java, (SDK). The SDK acts sort of like an interface as it exposes abstract classes that your code need to extend/target, (more about that later).

The SDK comes in the form of a .jar file, and you download the SDK from here.

Niels dives deep into the topic, so set aside a bit of time to read through this one.

Comments closed

Amazon Redshift ETL Tips

The Blendo team shares a few tips around ETL’ing data to Amazon Redshift:

2. The WLM Method
Use Amazon Redshift’s WLM (workload management) for defining a dedicated queue for the ETL process. Configuring the ETL queue with a small number of slots will help in avoiding excessive COMMITs. Also, avoid COMMITing separately for each transaction since commits are expensive.
Instead, surround multiple steps of the ETL process by a BEGIN…END statement. You can perform COMMIT only after all transformation logic is executed.

Click through for the set of tips.

Comments closed

Message Buses on the Market

Alex Woodie walks us through some of the options available for message buses:

Apache Kafka

Apache Kafka is a distributed open source messaging bus that was written in Java and Scala. The software implements a publish and subscribe messaging system that’s capable of moving large amounts of event data from sources to sinks, in a high-throughput manner with minimal latency and strong consistency guarantees. The software relies on Apache Zookeeper for management of the underlying cluster.

Kafka is based on the concept of producers and consumers. Event data originating from producers is stored timestamped partitions that are housed within Kafka topics. Meanwhile, consumer processes can read the data stored in Kafka partitions. Kafka automatically replicates partitions across multiple brokers (or nodes in the cluster), which allows Kafka to scale its message streaming service in a fault-tolerant manner.

Click through for descriptions of several good options. And if you want a big list, queues.io has one for you.

Comments closed

TDE + Copy-Only Backups

Jovan Popvic shows how you can take a copy-only backup of a TDE-protected database in Azure SQL Managed Instance:

We are recommending to rely on automatic backups only, with the build-in restore functionality to restore a database from a point-in-time, restore a database to another instance (for instance from production to dev)or Geo-restore functionalities to move your database. These automatic backups can be kept up to 35 days. These built-in automatic backups are secure and enables you to be fully compliant. In this scenario COPY_ONLY backups are only in some specific cases.

Strict TDE protection don’t allow you to take your own custom backups. If you need a backup of a TDE protected database, you would need to temporary disable TDE, take a backup, and then enable TDE again.

It’s not really a Managed Instance-specific thing, but rather TDE: if you want to take a non-encrypted backup of an encrypted database, you’ve got to kill encryption first.

Comments closed

Creating R Visuals in Power BI

Dave Mason takes us through showing an R-based visual in Power BI:

The R engine isn’t included with the installation of Power BI desktop. I won’t go into detail on this, so just know you’d need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you’ll see options to set the R home directory and the desired R IDE.

Click through for the demo.

Comments closed

Plan Cache Sizes

Erin Stellato shares information on plan cache sizes:

If you have an ad hoc workload, you’ll often hit the max number of entries before you hit the space limit, particularly if you have the optimize for ad hoc workloads server option enabled, which stores the plan stub for an ad hoc query on initial execution, rather than the entire plan (the plan stub consumes less space, so this is an attempt to reduce bloat).

Click through for the numbers and a couple of options you have around plan cache size.

Comments closed

Methods for Rewriting SQL Queries

Bert Wagner puts together a list of 12 techniques for tuning SQL queries:

6. DISTINCT with few unique values
Using the DISTINCT operator is not always the fastest way to return the unique values in a dataset. In particular, Paul White uses recursive CTEs to return distinct values on large datasets with relatively few unique values. This is a great example of solving a problem using a very creative solution.

Click through for the full list as well as a video demonstration.

Comments closed

More on Index Fragmentation

Tibor Karaszi revises and extends some remarks on index fragmentation:

In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.

That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?

Read on for a situation in which fragmentation does matter.

Comments closed