Press "Enter" to skip to content

Day: July 21, 2016

HBase Incremental Backup And Restore

Carter Shanklin and Vladimir Rodionov discuss incremental backup and restore coming to HBase & Phoenix:

If your tables are large it may not be possible to restore them under a different name due to space constraints. The really powerful thing about HBase backups is they are stored in WAL files that can be parsed using a simple interface that can be consumed either in Java or using the “hbase wal” utility.

Consider this scenario: A customer rep deleted some data because he thought it was unimportant. A week later the customer is upset because the data was important and you need to restore these few pieces of information. With HBase backups all you need to do is parse through the backups with a WAL reader and extract the historical values, which you can then add back in. With other databases you would have to bring another database instance online and load the backups into it. Having backups in open, well-understood formats unlocks many powerful opportunities and can bring recovery times down from days to minutes.

Read on if you manage a Hadoop cluster with HBase (or you’re likely to administer one soon).

Comments closed

Satellite Image Combination

David Yanofsky discusses how he pieced together satellite images for a report on mainland Chinese trash washing up on Hong Kong shores:

The USGS has a website called EarthExplorer that lets you search through decades of satellite data. I limited my search to Landsat 8 data with less than 10% cloud cover.

(you can do this search on the command line with landsat-util too but i prefer the web interface. In the future I will probably use this online toolfrom Development Seed)

Hat tip to Nathan Yau.

Comments closed

SparkR + Zeppelin

I take a look at using SparkR and Zeppelin:

My goal is to do some of the things that I did in my Touching on Advanced Topics post.  Originally, I wanted to replicate that analysis in its entirety using Zeppelin, but this proved to be pretty difficult, for reasons that I mention below.  As a result, I was only able to do some—but not all—of the anticipated work.  I think a more seasoned R / SparkR practitioner could do what I wanted, but that’s not me, at least not today.

With that in mind, let’s start messing around.

SparkR is a bit of a mindset change from traditional R.

Comments closed

Biml Transformers

Bill Fellows uses a Biml transformer to change a variable’s value:

Our transformer is simple. We’ve specified “LocalMerge” as we only want to fix the one thing. That one thing, is an SSIS Variable named “ServerName”.

What is going to happen is that we will redeclare the variable, this time we will specify a value of “SQLQA” for our Value. Additionally, I’m going to add a Description to my Variable and preserve the original value. The TargetNode object has a lot of power to it as we’ll see over this series of posts on Biml Transformers.

I’ve not used transformers before; this was interesting.

Comments closed

Listing Enterprise-Only Features

Patrick LeBlanc has an embedded report which shows Enterprise Edition-only features:

Someone recently asked me if there was a list of all the SQL Server “Enterprise Only” features available on the web.  I pointed them to the Features Supported by the Editions of SQL Server web page and thought I was done.  He stated that this site was good, but did not provide a simple list of enterprise only features.  I thought for a second, and my thoughts went straight to Power BI.  Why?  Simple, there are tables on the web page, and Power BI can easily extract that data into a data model.  I am not going to go into all those details in this blog post.  Maybe one day, but for now take a look at this interactive Power BI report and let me know what you think.

I think this layout is a bit easier to read and follow than the features website, although I’d love to be able to click on an item and get more information on the feature.

Comments closed

Deployment As Process

Ginger Grant argues in favor of deployment processes:

All sorts of things can happen when one person writes and deploys. I know someone who worked in the IT department for a large cell phone company. At the time, working there meant free phone service. One of the devs was a heavy user of the free phone service and so was his large extended family. His job was to maintain the billing code. After several questionable incidents at work, HR got involved and he was perp walked out of the building. Due to the circumstances surrounding his departure, his cell phone accounts were checked to ensure from this point on, he would get a bill. Although his account showed a number of active phones, his balance was always zero. The code in source control was checked and there was nothing in it which provided a reason why his bill was zero. Upon further investigation, my friend noticed the version number in production did not match the version number in source control. The code in source control was compiled and a huge balance appeared for the former employee. If someone else had deployed the code in source control, this chicanery would not have been possible.

This is an interesting read, so go check it out.

Comments closed

Slack T-SQL Reference App

Daniel Hutmacher has a new app to pull MSDN help into Slack:

The API has to do two things: perform an OAUTH authentication, and respond to API queries. Slack uses OAUTH to authenticate that a user really does want to add the app to the Slack team. In essence, this happens in three steps. Slack sends a web request to your API, the API server then requests an OAUTH token from the Slack service (using a secret key), and finally returns this token on the original Slack connection.

As for serving up the T-SQL reference documentation, it’s just a matter of opening static text files in the web server directory and passing them with some encoding to the client. That’s it.

All of these exchanges are JSON encoded, which is trivial to work with in node.js.

If you’re spending all day in Slack, this seems like a pretty good thing to do.

Comments closed

T-SQL Tuesday Roundup

Chris Yates provides a round-up for the latest T-SQL Tuesday:

The roundup is finally here, and cheers to all of you who participated. We had a great turnout this month with many returning participants along with some newcomers.

We had a wide range of topics with many great insights from everyone, but don’t take my word for it. Check out the links below and see what your colleagues from around the SQL Community had to say:

Click through for the links.

Comments closed

Memory-Optimized Objects In The Transaction Log

Frank Gill looks at the fn_dblog_xtp function:

Transactions against an in-memory table will return a single row in the result set from fn_dblog, while the result set from fn_dblog_xtp will contain a row for all activity. For example, if you insert 100 rows into an in-memory table, fn_dblog will contain 3 records: a BEGIN TRANSACTION, a single row for the INSERT, and a COMMIT TRANSACTION. fn_dblog_xtp will contain 102 records: a BEGIN TRANSACTION, and a row for each row inserted, and a COMMIT TRANSACTION.

One of the new columns in fn_dblog_xtp is xtp_object_id. I tried to join this to the sys.objects table to return the object name, but the values didn’t match. After banging my head against my monitor for a while, I posed the question to #sqlhelp on Twitter. Andreas Wolter (b|t) responded that a correlation can be made using DMV sys.memory_optimized_tables_internal_attributes.  Excited, I tried to join fn_dblog_xtp to the DMV, but was disappointed.

Read the whole thing.

Comments closed

Multi-Tenant Database Architectures

James Serra describes a few architectures for multi-tenant databases in the cloud:

Separate Servers\VMs

You create VMs for each tenant, essentially doing a “lift and shift” of the current on-premise solution.  This provides the best isolation possible and it’s regularly done on-premises, but it’s also the one that doesn’t enable cutting costs, since each tenant has it’s own server, sql, license and so on.  Sometimes this is the only allowable option if you have in your client contract that their data will be hardware-isolated from other clients.  Some cons: table updates must be replicated across all the servers (i.e. updating reference tables), there is no resource sharing, and you need multiple backup strategies across all the servers.

Read on for a few other strategies.  There aren’t any cloud-only details here; you could implement the same strategies on-premises.

Comments closed