As I mentioned, we have implemented KSQL on top of the Kafka Streams API. This means that every KSQL query is compiled into a Kafka Streams application. Therefore, KSQL queries follow the same execution model of Kafka Streams applications.
A query can be executed on multiple instances, and each instance will process a portion of the input data from the input topic(s) as well as generate portions of the output data to output topic(s). Based on this execution model and depending on how we want to run our queries, currently, KSQL provides two deployment options.
Read on for those options.
The Small World of Words project focuses on word associations. You can try it out for yourself to see how it works, but the general idea is that the participant is presented with a word (from “telephone” to “journalist” to “yoga”) and is then asked to give their immediate association with that word. The project has collected more than 15 million responses to date, and is still collecting data. You can check out some pre-built visualizations the researchers have put together to explore the dataset, or you can download the data for yourself.
It’s an interesting analysis of the data set, mixed in with some good R code.
Plotting univariate (sampled) normal data
Well, that’s obvious.
d %>% ggplot(aes(x = X1)) + geom_density()
It gets much less obvious from there. It was also interesting learning about
ggplotly, a function which translates ggplot2 visuals to plotly visuals.
The SSIS foreach loop is configured to allow you to easily ingest multiple data files in a single data flow task. For this to work, all of the files would need to reside in the same directory structure (either locally or on the network), and they must all have the same structure and metadata.
In this design, the data flow is contained within the foreach container, which will execute the contents of that data flow task once for each file found in the specified directory.
This gives us a good pattern for loading a bunch of text files, such as monthly extracts from a different system.
The only operation SQL Server 2016 and 2017 support on encrypted database columns is equality comparison, providing you use deterministic encryption. For anything else, your apps need to download the data to perform the computations outside of the database. Similarly, if you need to encrypt your data for the first time or re-encrypt it later (e.g. to rotate your keys), you need to use special tools that move the data and perform crypto operations on a different machine than your SQL Server computer. These restrictions are not an issue if equality comparison is all your applications need and if the tables containing your sensitive data are small. However, many types of sensitive information, e.g. a person’s name or phone number, often require richer operations, including pattern matching and sorting, and it’s not uncommon for sensitive data to be too large to move outside of the database for processing.
To address the above challenges, Always Encrypted in SQL Server 2019 is enhanced with secure enclaves. A secure enclave is a protected region of memory that appears as a black box to the containing process and to other processes running on the machine, including the operating system. There is no way to view the data or code inside the enclave from the outside, which makes enclaves ideal for processing sensitive data. There are several enclave technologies that differ in how enclave isolation is accomplished. SQL Server 2019 preview uses a Windows Server technology called Virtualization Based Security (VBS), which relies on Hypervisor to protect and isolate enclaves.
They’re going further with Always Encrypted than I thought would be possible. The first release of Always Encrypted had me asking “Why would I use this over running an encryption or decryption function in my app code?” I think secure enclaves starts to answer that question.
Let’s say you’ve got a lot of RSJobs. Or maybe long-running RSJobs. Wouldn’t a nice, user-friendly view of the progress be handy? You bet it would!
This is both dense and verbose, but I wrote it really quickly and it works (but not just for this post, I’ve used it in production), so I’m not tinkering too much with it right now. Once the jobs are created (and a few have started), this script runs a loop watching the output of
Get-RSJoband updates a progress bar to reflect where we are. It also takes care of receiving and removing the completed jobs.
This is pretty easy to work with—or at least Andy makes it look that way.
We are on the home stretch and I have saved the last third of the series to cover some pretty cool/amazing commands. Not that all 500+ commands are not cool/amazing but these last few are used continuously in my environment and bring automation closer to your finger tips with PowerShell and SQL Server then ever before.
Have you ever been asked to take a database backup before a developer does a deployment? Have you ever been asked to backup a database to restore it to a development or other environment? How about, hey can you take a quick transaction log backup? These can all be accomplished using Backup-DbaDatabase and only changing a couple of parameters each time.
This post is strictly about taking backups, but dbatools is also great about testing backups.
Now that we’ve seen our data, it’s a relatively simple task to convert the R script to a Python script. There are a few major differences. First, Python is a general purpose programming language, whereas R is a statistical programming language. This means that some of the functionality provided in Base R requires additional libraries in Python. Pandas is a good library for data manipulation, but is already included by default in Power BI. Scikit-learn (also known as sklearn) is a good library for build predictive models. Finally, Seaborn and Matplotlib are good libraries for creating data visualizations.
In addition, there are some scenarios where Python is a bit more verbose than R, resulting in additional coding to achieve the same result. For instance, fitting a regression line to our data using the sklearn.linear_model.LinearRegression().fit() function required much more coding than the corresponding lm() function in R. Of course, there are plenty of situations where the opposite is true and R becomes the more verbose language.
Click through for the full example.
My test environment is running SQL Server 2017 on Windows Server 2016, a pretty vanilla environment. After downloading the appropriate installer for the server where the service was to run, I installed it, made the necessary changes to the config file per the documentation provided after installation, assigned the service account with the necessary privileges, and attempted to start the service.
In the Windows System error log, I got three messages.
Read on for the solution.