But how fast were they? Can we get faster? Thankfully R provides `system.time()` for timing code execution. In order to get faster, it makes sense to use all the processing power our machines have. The ‘parallel’ library has some great tools to help us run our jobs in parallel and take advantage of multicore processing. My favourite is `mclapply()`, because it is very very easy to take an `lapply` and make it multicore. Note that mclapply doesn’t work on Windows. The following script runs the `read_clean_write()` function in a for loop (boo, hiss), lapply and mclapply. I’ve run these as list elements to make life easier later on.
It’s interesting reading, particularly because I had expected lapply to do a little bit better. Also interesting is the relative overhead cost of mclapply in this scenario: going from 1 core to 4 cut the time to approximately 1/3, not 1/4.
Folks in the Linux world are used to moving SSH keys to and from systems enabling password-less authentication using SSH user keys. Let’s take a minute to look at what it takes to use PowerShell to distribute SSH user keys to remote systems.
In the OpenSSH package there’s a command ssh-copy-id which is a bash script that copies a user’s public key to a remote systems. There’s a little intelligence in the script to set things up properly on the remote system for password-less key based authentication. If the appropriate directory and key file aren’t set up, ssh-copy-idwill create the directory and key file with the correct permissions on remote system. As far as I can tell, ssh-copy-id has not been implemented in the Win32-OpenSSH port. So that leaves us with implementing this functionality ourselves, in PowerShell.
Read on to see how he does it.
If you have experience with PowerShell, some properties of Bash variables will feel familiar. In Bash, variables are denoted with a
$just like in PowerShell, but unlike PowerShell the
$is only needed when they are being referenced. When you are assigning a value to a variable, the
$is left off:
Above we assigned a value to
my_varwithout using the
$, but when we then referenced it in the
printfstatement, we had to use a
$. We also enclosed the variable name in curly braces. This is not required in all cases, but it is a good idea to get in the habit of using them. In cases where you are using positional parameters above 9 (we’ll talk about this later) or you are using a variable in the middle of a string the braces are required, but there is no harm in adding them every time you use a variable in a string.
The basic syntax is pretty familiar to most programming languages, and there’s nothing scary about outputs, even when Mark starts getting into streams.
Now, not every environment is the same. Instances get configured differently or things change just due to the nature of the business. In a previous life I would routinely have to backup a database and restore it to another server. However, the server I was using to restore to had a different drive configuration. It happens. Anyway, I wanted a script that would give me
- A backup statement
- A restore statement with the appropriate MOVE switch
This would allow me to easily execute the backup statement and then on the target server execute the restore statement. Since the restore statement already had the appropriate MOVE switch provided, I didn’t have to manually compose the statement.
Click through for the script.
-- 1) Create the Certificate: CREATE CERTIFICATE [Permission$ViewServerState] ENCRYPTION BY PASSWORD = 'UseBetterPassword!' WITH SUBJECT = 'VIEW SERVER STATE permission', EXPIRY_DATE = '2099-12-31'; /* The password is used to protect only the Private Key, not the Public Key or the Certificate in general. The default expiration date is 1 year from the date the Certificate is created. However, in many situations, such as with Module Signing, the expiration date is actually ignored. Still, I prefer to set the expiration date to 2099. */
The DBA handling this server in the year 2100 is going to be really upset. Certificate signing works wonders in conjunction with stored procedures, as you can grant procedures all kinds of rights needed for your application but users cannot run ad hoc queries expecting those same rights.
One of the common issues that I would run into with refreshing development and test environments are the orphaned users. These are accounts that typically exist in production, but not in development. The logins and users are different in these environments, and often there isn’t a login on the development instance. This creates an orphaned user: one that exists in the database, but has no instance level mapping.
Cleaning up these users isn’t that hard, often with a removal of the user, mapping to a different login, or repairing this user by adding back the server login. These aren’t difficult tasks, but the logic to write a script to quickly fix this, especially in a DR situation, may not be handy.
Read on to see how easy dbatools makes this.
How it works: For each join example, there are two tables, the left and the right table, shown as two columns. For the sake of simplicity, these tables are called “a” and “b” respectively in the code.
You’ll notice that the sheet uses a kind of pseudo-code when it comes to table names and column names.
Click through for the PDF. I appreciate that he did not use Venn diagrams, as those should be saved for Union/Intersect/Except.
PCA changes the axis towards the direction of maximum variance and then takes projection on this new axis. The direction of maximum variance is represented by Principal Components (PC1). There are multiple principal components depending on the number of dimensions (features) in the dataset and they are orthogonal to each other. The maximum number of principal component is same as a number of dimension of data. For example, in the above figure, for two-dimension data, there will be max of two principal components (PC1 & PC2). The first principal component defines the most of the variance, followed by second principal component, third principal component and so on. Dimension reduction comes from the fact that it is possible to discard last few principal components as they will not capture much variance in the data.
PCA is a useful technique for reducing dimensionality and removing covariance.
The kicker, as Wickham describes on pages 4-5, is that normalization is a critical part of tidying data. Specifically, Wickham argues that tidy data should achieve third normal form.
Now, in practice, Wickham argues, we tend to need to denormalize data because analytics tools prefer having everything connected together, but the way we denormalize still retains a fairly normal structure: we still treat observations and variables like we would in a normalized data structure, so we don’t try to pack multiple observations in the same row or multiple variables in the same column, reuse a column for multiple purposes, etc.
I had an inkling of this early on and figured I was onto something clever until I picked up Wickham’s vignette and read that yeah, that’s exactly the intent.