Press "Enter" to skip to content

Author: Kevin Feasel

Generating Test Data In Kafka

Yeva Byzek takes us through the Kafka Connect Datagen connector:

Short of using real data from a real source, you do have a few options on how to generate more interesting test data for your topics. One option is to write your own client. Kafka has many programming language options—you choose: Java, Python, Go, .NET, Erlang, Rust—the list goes on. You can write your own Kafka client applications that produce any kind of records to a Kafka topic, and then you’re set.
But wouldn’t it be great if you could generate data locally to just fill topics with messages? Fortunately, you’re in luck! Because we have those data generators.

Click through for a demonstration.

Comments closed

Containerizing Python And MySQL

Allison Tharp walks us through containerizing a Python-based game she had created:

I’m really amazed at how easy creating the container was.  It took only 11 lines to spin up a Linux environment on my own machine.  The majority of the commands (7 of the 11) are simply adding the files and dependencies.  I’m also pretty shocked that I didn’t have to do anything to my Python script to get this to work.  I had assumed I would need to do something but, I didn’t.  Very cool!  Also, by using the following command while my Python script is running, I see that this is only taking up 1.3 GB!

Click through for scripts and important lessons learned along the way.

Comments closed

Unattended SQL Server Installations

Michal Sadowski walks us through unattended installations:

Imagine a situation when you are tasked to install a dozen SQL Server instances. Using an approach described in previous blog post  installation can take a significant amount of resources and time, also there is a probability of errors due to typos (e.g. in directory paths, etc.)
There are two options that can be used for the installation:
1. Using ConfigurationFile.ini
2. Using command line parameters to setup.exe file
Below there are step-by-step instructions on how to install SQL Server using both options.

If you only have a couple of SQL Server instances in your environment, unattended installation may seem a bit overwhelming. But as you have to deal with larger and larger numbers of instances, it’s the only way to keep your sanity. It’s also the only way to install on Windows Server Core.

Comments closed

Cloudera And Hortonworks Officially Merged

Arun Murthy gives the used-to-be-Hortonworks perspective on the now-official merger of Cloudera and Hortonworks:

Our merger did not arise out of the blue. Our respective missions were well aligned, and together the new Cloudera has the scale it needs to service the constantly changing needs of the world’s most demanding organizations and to grow even more dominant in the market.
New open-source standards such as Kubernetes, container technology and the growing adoption of cloud-native architectures are major parts of Cloudera’s strategy.  Our primary initiative out of the gate is to deliver a 100-percent open-source unified platform, which leverages the best features of Hortonworks Data Platform (HDP) 3.0 and Cloudera’s CDH 6.0. Cloud-native and built for any cloud – with a public cloud experience across all clouds – the unified platform embodies our shared “cloud everywhere” vision.

I’m more a fan of the Hortonworks tooling like Ambari than I am of Cloudera’s alternatives, so it will be interesting to see what happens going forward. The good news for recalcitrant types like me is that HDP will be around for a couple of years yet.

Comments closed

Installing Azure DevOps Server 2019

Kevin Chant takes a look at Azure DevOps Server 2019 RC1:

To clarify, Azure DevOps Server 2019 is the new name for TFS. Furthermore, it’s now rebranded and updated to look and feel like the Azure DevOps service. Which is the new name for Visual Studio team Services (VSTS) in Azure.
Therefore, I decided to do this because I am currently involved with Azure DevOps and using it with SQL Server deployments. Hence, I decided I wanted a local copy of my own to test things.

Click through for a couple of interesting findings, especially around required (and disallowed) versions of prerequisites like SQL Server versions.

Comments closed

Counting Virtual Log Files

Jamie Wick has a couple of scripts to help us monitor virtual log file counts for our transaction logs:

The LOGINFO command returns a row for each Virtual Log File in the Transaction Log. The FileSize column shows the size (KB) of each individual VLF. The amount of the transaction log that is currently in use can be determined by the number rows with a Status of 2 (0 = Inactive, 2 = Active & 4 = VLF exists on the Primary replica, but not on the Secondary replica).
One of the biggest problems with the LOGINFO command has been the difficulty of extrapolating the information to include all databases on the server, for use in reports, automated alerts and maintenance processes. Beginning with SQL Server 2016 SP2, there are 2 new Dynamic Management Views (DMVs) that can be used for monitoring the VLF information: sys.dm_db_log_info & sys.dm_db_log_stats.

A huge number of VLFs can be a hidden performance sink.

Comments closed

How Query Store Stores Date Data

Erin Stellato shows us a few nuances to the way that Query Store handles dates internally:

Query Store retains query performance data at the plan level.  This data is then broken out into intervals of time, determined by the INTERVAL_LENGTH_MINUTES setting.  The time intervals are found in the sys.query_store_runtime_stats_interval system view, and the start_time and end_time columns are of the DATETIMEOFFSET data type.  This means that the date is time-zone aware, and in Query Store the data is stored as UTC.  Now why does all of this matter?  Because handling dates in Query Store is important if you’re going to query the data directly.

Click through to see a query of what this looks like, as well as a few tips on parsing the data.

Comments closed

xp_cmdshell And Non-Sysadmin Accounts

Lucas Kartawidjaja shows us how you can grant a non-sysadmin user the right to run xp_cmdshell:

Once we run the above T-SQL query, any account that is part of the sysadmin role in the SQL Server instance has the ability to run the xp_cmdshell extended stored procedure. On the background, when the user with sysadmin privileges runs the xp_cmdshell, it will execute the Windows command shell using the SQL Server Service Account (So if you are executing xp_cmdshell to access certain resource on the network, for example, and you are having permission issue, you might want to make sure that the SQL Server Service Account has permission to that resource).
Now, what if you have a non-sysadmin account that needs to run xp_cmdshell? In order to do that, we would need to do some additional configuration.

Granting non-sysadmins rights to run xp_cmdshell definitely rates as well above-average in terms of risk. I don’t have any problem with xp_cmdshell being turned on—especially considering that by default, only sysadmin accounts get it and sysadmin accounts can turn it on if it’s disabled, meaning it’s effectively always on for sysadmin. But when you start granting non-sysadmin accounts the ability to shell out, you have to be even more careful of protecting that SQL Server instance.

Comments closed

Writing Vectorized Code In R

John Mount helps us understand writing R code like a native:

This sort of difference, scalar oriented C++ being so much faster than scalar oriented R, is often distorted into “R is slow.”
This is just not the case. If we adapt the algorithm to be vectorized we get an R algorithm with performance comparable to the C++implementation!
Not all algorithms can be vectorized, but this one can, and in an incredibly simple way. The original algorithm itself (xlin_fits_R()) is a bit complicated, but the vectorized version (xlin_fits_V()) is literally derived from the earlier one by crossing out the indices. That is: in this case we can move from working over very many scalars (slow in R) to working over a small number of vectors (fast in R).

This is akin to writing set-based SQL instead of cursor-based SQL: you’re thinking in terms which make it easier for the interpreter (or optimizer, in the case of a database engine) to operate quickly over your inputs. It’s also one of a few reasons why I think learning R makes a lot of sense when you have a SQL background.

Comments closed

gganimate Now On CRAN

Thomas Lin Pedersen announces that gganimate is now available on CRAN:

While this commit was done in the autumn 2017, nothing further happened until I decided to make gganimate the center of my useR 2018 keynote, at which point I was forced (by myself) to have some sort of package ready by the summer of 2018.
A fair amount of users have shown displeasure in the breaking changes this history has resulted in. Many blog posts have already been written focusing on the old API, as well as code on numerous computers that will no longer work. I understand this frustration, of course, but both me and David agreed that doing it this way was for the best in the end. I’m positive that the new API has already greatly exceeded the mind-share of the old API and given a year the old API will be all but a distant memory…

Read on for information on these breaking changes, and how the changes will make life easier in the long run. And stay for the fireworks. H/T R-Bloggers

Comments closed