Kevin Feasel – Page 1606

Lubridate Updates

Published 2016-09-16 by Kevin Feasel

Hadley Wickham reports on a Lubridate update:

Date time rounding (with round_date(), floor_date() and ceiling_date()) now supports unit multipliers, like “3 days” or “2 months”:
```
ceiling_date(ymd_hms("2016-09-12 17:10:00"), unit = "5 minutes")
#> [1] "2016-09-12 17:10:00 UTC"
```

If you handle date and time data in R, Lubridate is a tremendous asset.

Comments closed

Notebook Practices

Published 2016-09-16 by Kevin Feasel

Jonathan Whitmore has good practices for managing Jupyter notebooks:

Here’s an example of how we use git and GitHub. One beautiful new feature of Github is that they now render Jupyter Notebooks automatically in repositories.

When we do our analysis, we do internal reviews of our code and our data science output. We do this with a traditional pull-request approach. When issuing pull-requests, however, looking at the differences between updated .ipynb files, the updates are not rendered in a helpful way. One solution people tend to recommend is to commit the conversion to .py instead. This is great for seeing the differences in the input code (while jettisoning the output), and is useful for seeing the changes. However, when reviewing data science work, it is also incredibly important to see the output itself.

So far, I’ve treated notebooks more as presentation media and used tools like R Studio for tinkering. This shifts my priors a bit.

Comments closed

Restoring All Databases

Published 2016-09-16 by Kevin Feasel

Kevin Hill builds a script to reload all databases at once:

We are doing a major upgrade this weekend, so like any good DBA, I have planned for a full backup the night before and needed the ability to quickly restore if it goes sideways and needs to roll back.

The catch is that there are close to 300 identical databases involved.

This is easy enough to do if you know where info is stored in MSDB related to the last backup.

Click through for the script.

Comments closed

Getting On The Biml Horse

Published 2016-09-16 by Kevin Feasel

Bill Fellows gives three patterns for Biml adoption:

Metadata driven

This approach “puts it all together.” From bottom to top, we take the Biml files we developed in the Cyclical phase and make them into “patterns.” That doesn’t have to be a complex endeavor, it could be as simple as putting a variable in for package name.

This is a nice stab at an organizational Biml maturity model.

Comments closed

Azure SQL Database Size Quotas

Published 2016-09-16 by Kevin Feasel

Dimitri Furman discusses the MAXSIZE property on an Azure SQL Database:

Customers can use this ability to allow scaling down to a lower service objective, when otherwise scaling down wouldn’t be possible because the database is too large.

While this capability is useful for some customers, the fact that the actual size quota for the database may be different from the maximum size quota for the selected service objective can be unexpected, particularly for customers who are used to working with the traditional SQL Server, where there is no explicit size quota at the database level. Exceeding the unexpectedly low database size quota will prevent new space allocations within the database, which can be a serious problem for many types of applications.

One more thing to think about, I suppose.

Comments closed

Trained Python Models

Published 2016-09-16 by Kevin Feasel

Koos van Strien wants to bring a trained Python model into Azure ML:

The path of bringing a trained model from the local Python/Anaconda environment towards cloud Azure ML is globally as follows:

Export the trained model
Zip the exported files
Upload to the Azure ML environment
Embed in your Azure ML solution

Click through to see the details. Koos did a great job making it look easy.

Comments closed

Bubble Charts

Published 2016-09-16 by Kevin Feasel

Devin Knight continues his custom visuals series with the bubble chart:

Each category is defined as a bubble on the visual.
The size of the bubble is defined by a measure.
The Bubble chart does allow for cross filtering. Meaning you can click on a bubble in the chart and it will filter other report items

This is a nice eye-catcher image, like you might see in a news article.

Comments closed

Restoring An Azure SQL Database

Published 2016-09-16 by Kevin Feasel

Arun Sirpal discusses ways to restore a database within Azure SQL Database:

You won’t have the ability to use the same name of the restoring database and the database that you want to replace; if you try you get the screen shot below: To get around this I think you would need to drop the old one once the new one has restored then do a rename.

This is a big difference compared to the on-prem version, so be sure to practice this before you find yourself in a crisis.

Comments closed

Migrating To Azure SQL Database

Published 2016-09-16 by Kevin Feasel

Mike Fal discusses BACPACs, DACPACs, and migrating on-prem databases to Azure SQL Database:

SQL Server Data Tools(SSDT) have always had a process to extract your database. There are two types of extracts you can perform:

DACPAC – A binary file that contains the logical database schema and possibly the data. This file retains the platform version of the database (i.e. 2012, 2014, 2016).
BACPAC – A binary file that contains the logical database schema and the data as insert statements. This stores the platform version, but is not locked into it.

Mike also walks through SqlPackage.exe.

Comments closed

Change Data Capture With Apache NiFi

Published 2016-09-15 by Kevin Feasel

Satish Bomma uses Apache NiFi to perform change data capture on a MySQL database:

The main things to configure is DBCPConnection Pool and Maximum-value Columns

Please choose this to be the date-time stamp column that could be a cumulative change-management column

This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination.

This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions.

That last paragraph in the snippet is key: it’s not a true replacement for CDC-friendly products. It is, however, a good example for showing how to use NiFi to connect to a relational database and pump data out of it.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

Metadata driven