In CTP 2.5 and onwards when you write Java code for SQL Server you implement your code using the Microsoft Extensibility SDK for Java, (SDK). The SDK acts sort of like an interface as it exposes abstract classes that your code need to extend/target, (more about that later).
The SDK comes in the form of a
.jarfile, and you download the SDK from here.
Niels dives deep into the topic, so set aside a bit of time to read through this one.
2. The WLM Method
Use Amazon Redshift’s WLM (workload management) for defining a dedicated queue for the ETL process. Configuring the ETL queue with a small number of slots will help in avoiding excessive COMMITs. Also, avoid COMMITing separately for each transaction since commits are expensive.
Instead, surround multiple steps of the ETL process by a BEGIN…END statement. You can perform COMMIT only after all transformation logic is executed.
Click through for the set of tips.
Apache Kafka is a distributed open source messaging bus that was written in Java and Scala. The software implements a publish and subscribe messaging system that’s capable of moving large amounts of event data from sources to sinks, in a high-throughput manner with minimal latency and strong consistency guarantees. The software relies on Apache Zookeeper for management of the underlying cluster.
Kafka is based on the concept of producers and consumers. Event data originating from producers is stored timestamped partitions that are housed within Kafka topics. Meanwhile, consumer processes can read the data stored in Kafka partitions. Kafka automatically replicates partitions across multiple brokers (or nodes in the cluster), which allows Kafka to scale its message streaming service in a fault-tolerant manner.
Click through for descriptions of several good options. And if you want a big list, queues.io has one for you.
We are recommending to rely on automatic backups only, with the build-in restore functionality to restore a database from a point-in-time, restore a database to another instance (for instance from production to dev)or Geo-restore functionalities to move your database. These automatic backups can be kept up to 35 days. These built-in automatic backups are secure and enables you to be fully compliant. In this scenario COPY_ONLY backups are only in some specific cases.
Strict TDE protection don’t allow you to take your own custom backups. If you need a backup of a TDE protected database, you would need to temporary disable TDE, take a backup, and then enable TDE again.
It’s not really a Managed Instance-specific thing, but rather TDE: if you want to take a non-encrypted backup of an encrypted database, you’ve got to kill encryption first.
The R engine isn’t included with the installation of Power BI desktop. I won’t go into detail on this, so just know you’d need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you’ll see options to set the R home directory and the desired R IDE.
Click through for the demo.
How can I DELETE using OPENQUERY? Normally for a SELECT from OPENQUERY , I’ll do something like :
SELECT col1 ,col2 FROM OPENQUERY (MY_LINKED_SERVER,'SELECT col1,col2 FROM MY_LINKED_TABLE')
Can a similar method be used but for DELETE?
Click through for the answer.
If you have an ad hoc workload, you’ll often hit the max number of entries before you hit the space limit, particularly if you have the optimize for ad hoc workloads server option enabled, which stores the plan stub for an ad hoc query on initial execution, rather than the entire plan (the plan stub consumes less space, so this is an attempt to reduce bloat).
Click through for the numbers and a couple of options you have around plan cache size.
6. DISTINCT with few unique values
Using the DISTINCT operator is not always the fastest way to return the unique values in a dataset. In particular, Paul White uses recursive CTEs to return distinct values on large datasets with relatively few unique values. This is a great example of solving a problem using a very creative solution.
Click through for the full list as well as a video demonstration.
In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.
That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?
Read on for a situation in which fragmentation does matter.