The Basics Of Docker For R Users

Colin Fay explains some of the core principles behind Docker, containerizing some R code along the way:

Docker is designed to enclose environments inside an image / a container. What this allows, for example, is to have a Linux machine on a Macbook, or a machine with R 3.3 when your main computer has R 3.5. Also, this means that you can use older versions of a package for a specific task, while still keeping the package on your machine up-to-date.
This way, you can “solve” dependencies issues: if ever you are afraid dependencies will break your analysis when packages are updated, build a container that will always have the software versions you desire: be it Linux, R, or any package.

Click through for the details. H/T R-bloggers

Generating Test Data In Kafka

Yeva Byzek takes us through the Kafka Connect Datagen connector:

Short of using real data from a real source, you do have a few options on how to generate more interesting test data for your topics. One option is to write your own client. Kafka has many programming language options—you choose: Java, Python, Go, .NET, Erlang, Rust—the list goes on. You can write your own Kafka client applications that produce any kind of records to a Kafka topic, and then you’re set.
But wouldn’t it be great if you could generate data locally to just fill topics with messages? Fortunately, you’re in luck! Because we have those data generators.

Click through for a demonstration.

Containerizing Python And MySQL

Allison Tharp walks us through containerizing a Python-based game she had created:

I’m really amazed at how easy creating the container was.  It took only 11 lines to spin up a Linux environment on my own machine.  The majority of the commands (7 of the 11) are simply adding the files and dependencies.  I’m also pretty shocked that I didn’t have to do anything to my Python script to get this to work.  I had assumed I would need to do something but, I didn’t.  Very cool!  Also, by using the following command while my Python script is running, I see that this is only taking up 1.3 GB!

Click through for scripts and important lessons learned along the way.

Unattended SQL Server Installations

Michal Sadowski walks us through unattended installations:

Imagine a situation when you are tasked to install a dozen SQL Server instances. Using an approach described in previous blog post  installation can take a significant amount of resources and time, also there is a probability of errors due to typos (e.g. in directory paths, etc.)
There are two options that can be used for the installation:
1. Using ConfigurationFile.ini
2. Using command line parameters to setup.exe file
Below there are step-by-step instructions on how to install SQL Server using both options.

If you only have a couple of SQL Server instances in your environment, unattended installation may seem a bit overwhelming. But as you have to deal with larger and larger numbers of instances, it’s the only way to keep your sanity. It’s also the only way to install on Windows Server Core.

Cloudera And Hortonworks Officially Merged

Arun Murthy gives the used-to-be-Hortonworks perspective on the now-official merger of Cloudera and Hortonworks:

Our merger did not arise out of the blue. Our respective missions were well aligned, and together the new Cloudera has the scale it needs to service the constantly changing needs of the world’s most demanding organizations and to grow even more dominant in the market.
New open-source standards such as Kubernetes, container technology and the growing adoption of cloud-native architectures are major parts of Cloudera’s strategy.  Our primary initiative out of the gate is to deliver a 100-percent open-source unified platform, which leverages the best features of Hortonworks Data Platform (HDP) 3.0 and Cloudera’s CDH 6.0. Cloud-native and built for any cloud – with a public cloud experience across all clouds – the unified platform embodies our shared “cloud everywhere” vision.

I’m more a fan of the Hortonworks tooling like Ambari than I am of Cloudera’s alternatives, so it will be interesting to see what happens going forward. The good news for recalcitrant types like me is that HDP will be around for a couple of years yet.

Installing Azure DevOps Server 2019

Kevin Chant takes a look at Azure DevOps Server 2019 RC1:

To clarify, Azure DevOps Server 2019 is the new name for TFS. Furthermore, it’s now rebranded and updated to look and feel like the Azure DevOps service. Which is the new name for Visual Studio team Services (VSTS) in Azure.
Therefore, I decided to do this because I am currently involved with Azure DevOps and using it with SQL Server deployments. Hence, I decided I wanted a local copy of my own to test things.

Click through for a couple of interesting findings, especially around required (and disallowed) versions of prerequisites like SQL Server versions.

Counting Virtual Log Files

Jamie Wick has a couple of scripts to help us monitor virtual log file counts for our transaction logs:

The LOGINFO command returns a row for each Virtual Log File in the Transaction Log. The FileSize column shows the size (KB) of each individual VLF. The amount of the transaction log that is currently in use can be determined by the number rows with a Status of 2 (0 = Inactive, 2 = Active & 4 = VLF exists on the Primary replica, but not on the Secondary replica).
One of the biggest problems with the LOGINFO command has been the difficulty of extrapolating the information to include all databases on the server, for use in reports, automated alerts and maintenance processes. Beginning with SQL Server 2016 SP2, there are 2 new Dynamic Management Views (DMVs) that can be used for monitoring the VLF information: sys.dm_db_log_info & sys.dm_db_log_stats.

A huge number of VLFs can be a hidden performance sink.

How Query Store Stores Date Data

Erin Stellato shows us a few nuances to the way that Query Store handles dates internally:

Query Store retains query performance data at the plan level.  This data is then broken out into intervals of time, determined by the INTERVAL_LENGTH_MINUTES setting.  The time intervals are found in the sys.query_store_runtime_stats_interval system view, and the start_time and end_time columns are of the DATETIMEOFFSET data type.  This means that the date is time-zone aware, and in Query Store the data is stored as UTC.  Now why does all of this matter?  Because handling dates in Query Store is important if you’re going to query the data directly.

Click through to see a query of what this looks like, as well as a few tips on parsing the data.

xp_cmdshell And Non-Sysadmin Accounts

Lucas Kartawidjaja shows us how you can grant a non-sysadmin user the right to run xp_cmdshell:

Once we run the above T-SQL query, any account that is part of the sysadmin role in the SQL Server instance has the ability to run the xp_cmdshell extended stored procedure. On the background, when the user with sysadmin privileges runs the xp_cmdshell, it will execute the Windows command shell using the SQL Server Service Account (So if you are executing xp_cmdshell to access certain resource on the network, for example, and you are having permission issue, you might want to make sure that the SQL Server Service Account has permission to that resource).
Now, what if you have a non-sysadmin account that needs to run xp_cmdshell? In order to do that, we would need to do some additional configuration.

Granting non-sysadmins rights to run xp_cmdshell definitely rates as well above-average in terms of risk. I don’t have any problem with xp_cmdshell being turned on—especially considering that by default, only sysadmin accounts get it and sysadmin accounts can turn it on if it’s disabled, meaning it’s effectively always on for sysadmin. But when you start granting non-sysadmin accounts the ability to shell out, you have to be even more careful of protecting that SQL Server instance.

Categories

January 2019
MTWTFSS
« Dec  
 123456
78910111213
14151617181920
21222324252627
28293031