Press "Enter" to skip to content

Day: August 23, 2016

Not Catching Them All

Hanjo Odendaal explains clustering techniques using Pokemon:

To collect the data on all the first generation pokemon, I employ Hadley Wickam’s rvest package. I find it very intuitive and can handle all of my needs in collecting and extracting the data from a pokemon wiki. I will grab all the Pokemon up until to Gen II, which constitutes 251 individuals. I did find the website structure a bit of a pain as each pokemon had very different looking web pages. But, with some manual hacking, I eventually got the data in a nice format.

This probably means a lot more to you if you grew up in front of a Game Boy, but there’s some good technique in here regardless.

Comments closed

Migrating To Azure SQL Data Warehouse

Rangarajan Srirangam has a detailed article on steps you should take when migrating a database to Azure SQL Data Warehouse:

This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration. Once you understand the steps involved in migration, you can practice them by following a running example of migrating a sample database to Azure SQL Data Warehouse.

Migrating your data to Azure SQL Data Warehouse involves a series of steps. These steps are executed in three logical stages: Preparation, Metadata migration and Data migration.

It’s a lengthy read, but well worth it.

Comments closed

JupyterLab

Serdar Yegulalp reports that Jupyter is getting a major facelift:

JupyterLab uses a web-based UI that’s akin to the tab-and-panel interface used in IDEs like Visual Studio or Eclipse. Notebooks, command-line consoles, code editors, language references, and many more items can be arranged in various combinations, powered by the PhosphorJSframework.

“The entire JupyterLab [project] is built as a collection of plugins that talk to kernels for code execution and that can communicate with one another,” the developers wrote. “We hope the community will develop many more plugins for new use cases that go far beyond the basic system.”

It looks like they’re making major changes to keep up with Zeppelin on the back end.  The biggest advantage Jupyter had for me over Zeppelin was its installation simplicity, so I hope they keep it just as easy as installing Anaconda and then loading JupyterLab.

Comments closed

Copying A File Using SQL Server

Slava Murygin makes me want to add a “wacky ideas” category with this one:

At first, you have to read the file you want to copy into a SQL Server. You have to choose a database to perform that action. It can be Test database or you can create a new database to perform that action or it can be even TempDB. There is only two requirements for the database:
– It must not be a production Database;
– Database should have enough of space to accommodate the file you want to copy.

The idea is that if the database engine’s service account has rights to a file you want to access but don’t have permissions to access, you can bulk copy the contents as a binary blob and then grab the contents and write the results to your local system using bcp.  Sure, it becomes your company’s most expensive file copy tool, but I love the mad ingeniousness behind it.

Comments closed

Aster Plots

Devin Knight looks at the Aster Plot in his latest Power BI visualization video:

The Aster Plot allows a category that dives the chart and up to 2 measures.

  • The first measure controls the depth of each section

  • The second measure controls the width of each section

I have to admit that I’m not a fan of the Aster Plot.  It has all the disadvantages of pie and torus charts (specifically, that humans have a hard time discerning differences in angles) while making it more complex and comparing across a second dimension as well.

Comments closed

Cloud Security

Kenneth Fisher provides musings based on an Azure security document:

It gives you a map on how to manage your security as you move into the cloud. Note: one of the main points is that your on premise security is equally as important and has to be managed with and as a part of your cloud security.

Now if you are like me and want more than just dry reading they also provide a link to a Microsoft Virtual Academy training course called Security in a Cloud-Enabled World that follows this roadmap and provides more detail and guidance.

Read the whole thing.

Comments closed

Azure SQL Database Maintenance

Jeffrey Verheul mentions that Azure SQL Database databases need regular maintenance, too:

Before I’m going into detail, I want to give full kudos to Ola Hallengren (Website | @olahallengren). He has spend a lot of his time to build a SQL Server Maintenance Solution that is completely free for everyone to use. And he did such an excellent job a lot companies (also huge companies) use his solution to run maintenance tasks on their databases.

None of the scripts below are written by me, but only small changes are made in order to make things more clear when the solution is deployed to an environment. The original scripts can be downloaded via the download page on Ola’s website.

Most of the to-dos are the same between on-premises and Azure SQL DB, but some of the implementation steps are a bit different.  This is worth checking out if you have any Azure SQL Database instances.

Comments closed

Visualizations In Five Lines Of Code

David Smith highlights a Sharon Machlis article showing visualizations in up to five lines of R code:

I’ve reproduced Sharon’s code and charts below. I did make a couple of tweaks to the code, though. I added a call to checkpoint(“2016-08-22”) which, if you’ve saved the code to a file, will install all the necessary packages for you. (I also verified that the code runs with package versions as of today’s date, and if you’re trying out this code at a later time it will continue to do so, thanks to checkpoint.) I also modified the data download code to make it work more easily on Windows. Here are the charts and code

It’s really easy to get basic visualizations within R, and these are better than basic visualizations.

Comments closed

Multi-Database Schema Comparison

Kevin Hill shows how to perform schema comparison across multiple databases concurrently:

I recently had the need to compare a “Gold” or “Master” copy of our database to the 300 client copies of that database that exist in our Production environment.  I’m not alone…many fellow DBAs have had the same need.   Google searches for this confirm it.   This is for an upcoming upgrade to the application that will need post-upgrade comparison/verification.

There are 3rd party tools that do SQL Compares…my particular favorite is aptly named SQL Compare from Red Gate.  I’ve been using it off an on for 10 years.   I don’t know if it can be set up to hit more than one database at a time.  The other issue is that I don’t have a copy here.

Microsoft’s SQL Server Data Tools will also do this within Visual Studio.   Still one database at a time.  I forget where, but someone pointed me to the fact that SSDT uses SQLPackage.exe under the hood to do the work.  I figure if I can run it at a command line I can script out all of the databases.  I’m not much of a DOS scripting guy, so everything that follows is just my hack version…but it works, and not just on my machine!

This is very useful if you work in an environment with multiple copies of databases—or even if you have dev, test, and prod versions of the same database.

Comments closed