Press "Enter" to skip to content

Author: Kevin Feasel

Storm 1.0: Enhanced Debugging

Taylor Goetz discusses improvements in Storm 1.0:

The log file viewer added in the Apache Storm 0.9.1 release made accessing Storm’s log files significantly easier, but in some cases still required examination individual log files one-by-one. In Storm 1.0 the UI now includes a powerful search feature that allows you to search a specific topology log file, or across all topology log files in the cluster, even archived files.

When performing a topology-wide search, the UI will search across all supervisor nodes for a match. The search results include a link to the matching log file, as well as host and port information that allow you quickly identify on which machine a specific log event occurred. This feature is particularly helpful when trying to track down when and where a particular error occurred.

The examples Taylor gives are all built around scaling.  When you have dozens or hundreds of nodes, one-by-one solutions just don’t work.

Comments closed

R: Using Images As Labels

Jonathan Carroll shows how to use images as labels in R:

There are probably very few cases for which this is technically a good idea (trying to be a featured author on JunkCharts might very well be one of those reasons). Nonetheless, there are at least a couple of requests for this floating around on stackoverflow; here and here for example. I struggled to find any satisfactory solutions that were in current working order (though perhaps my Google-fu has failed me).

Jonathan is rather against this idea, and it does seem like the answer is a hack.  I suppose the real answer is “sometimes an image isn’t worth a thousand words.”

Comments closed

Converting SSIS Solution Versions

Andy Leonard shows how to convert a SQL Server Integration Services solution from one version of SQL Server to another:

Note the “(SQL Server 2014)” beside the project name in Solution Explorer.

If I want to deploy this project to an SSIS Catalog on a SQL Server 2016 instance, I should update the project to SSIS 2016. How do I do this? In Solution Explorer, right-click the project name and click Properties

There are some nice screen shots to walk you through this.  I’m happy that SSIS is moving in a multi-version direction.  That makes it easier for me as a developer to upgrade my tools without needing three versions of Visual Studio (or SSDT).

Comments closed

Premium Storage On Azure SQL Data Warehouse

Kenneth Nielsen reports that Azure SQL Data Warehouse will now support premium storage:

Today Microsoft have announced that Azure SQL Datawarehouse will support Premium Storage, this will allow the customers to see greater performance and predictability on queries. As of today, all newly created SQL Datawarehouse will be created with Premium Storage, at least in regions where Premium Storage is available. In the remainder of the preview period, the billing will continue to be based on standard pricing.

If you have Azure SQL DW, check it out to see if Premium is a big net benefit to you, as it looks like the price is the same for the moment.

Comments closed

EMR Now Supports Phoenix

Jonathan Fritz notes that Amazon’s ElasticMapReduce now offers easy support for Apache Phoenix:

Once your HBase table is ready, it’s time to map a table in Phoenix to your data in HBase. You use a JDBC connection to access Phoenix, and there are two drivers included on your cluster under /usr/lib/phoenix/bin. First, the Phoenix client connects directly to HBase processes to execute queries, which requires several ports to be open in your Amazon EC2 Security Group (for ZooKeeper, HBase Master, and RegionServers on your cluster) if your client is off-cluster.

Second, the Phoenix thin client connects to the Phoenix Query Server, which runs on port 8765 on the master node of your EMR cluster. This allows you to use a local client without adjusting your Amazon EC2 Security Groups by creating a SSH tunnel to the master node and using port forwarding for port 8765. The Phoenix Query Server is still a new component, and not all SQL clients can support the Phoenix thin client.

I am not HBase’s biggest fan, but I do think that Phoenix fixes one of the biggest problems HBase has:  its being completely foreign to most data professionals.  It’s not an accident that as a data platform matures, its development language looks more and more like SQL.

Comments closed

How Do You Trace?

Erin Stellato wants to know how you use Profiler and server-side traces:

Back in April I wrote a post asking why people tend to avoid Extended Events.  Many of you provided feedback, which I greatly appreciated.  A common theme in the responses was time.  Many people found that creating an event session in Extended Events that was comparable to one they would create in Trace took longer.  Ok, I get that…so as a follow up I am interested in knowing how you typically use Profiler and Trace.  If you would be willing to share that information, I would be extremely grateful.

Head over there and let her know.  For science!

Comments closed

Tying Meetup To Power BI

Reza Rad shows how to integrate Meetup data into Power BI:

In the documentation of this service in Meetup mentioned that Time column is:

time = UTC start time of the event, in milliseconds since the epoch

That means it is timestamp formatted. Timestamp value is number of seconds from epoch which is 1970-01-01 00:00:00. I have previously written about how to change timestamp value to date time and it is fairly easy with adding seconds to it. However for this case our value is not seconds, it is milliseconds so I have to first divide it by 1000.

This is pretty cool.  We’re starting up a Power BI meetup in the Triangle area, so it’ll be fun hosting a Power BI Meetup where we use Power BI to read Meetup data.

Comments closed

Normalize

I walk through a scenario which underscores the importance of normalization:

This joins my records to a tally table, which gives one row for each character in RemovedValue (that is, the numbers without recordset separators).  I then retain only the values which start a sequence, and use SUBSTRING to snatch up four digits. What I’m left with is a column named SplitVersion, which has one row for each customer, campaign, and 4-digit value (which is equivalent to my normalized table’s structure).

If that wasn’t exciting enough, we now need to slam this back together into our denormalized format, and that’s what tallyjoin does. It uses the FOR XML PATH trick to concatenate my four-digit values into one string, separated by commas. You might be wondering why I use comma instead of CHAR(30), and the answer is that converting CHAR(30) to XML returns a nasty result, so instead of trying to handle that, I use a character which is copacetic and translate it back using the REPLACE function after casting my “XML” result to varchar.

The implicit story here is, you can find someone who knows how to use tally tables, how to concatenate strings of data (quickly!), who knows how to tie various pieces of the puzzle together, and so on…or design the database the right way and avoid this pain later.

Comments closed

Key Lookup Without Output Column

Daniel Hutmacher gives us a head-scratcher:

Performance tuning the other day, I was stumped by a query plan I was looking at. Even though I had constructed a covering index, I was still getting a Key Lookup operator in my query plan. What I usually do when that happens is to check the operator’s properties to see what its output columns are, so I can include those columns in my covering index.

Here’s the interesting thing: there weren’t any output columns. What happened?

The answer makes perfect sense, and shows that looking at the SELECT clause isn’t enough.

Comments closed

Getting Physical

Denny Cherry explains the value of his having physical hardware on hand:

Now you may be asking why don’t we just do all this in Azure? And we could, but the reason we didn’t is pretty straight forward. Cost. Building tons of VMs in Azure and leaving them running for a few weeks for customers can cost a decent amount pretty quickly, even with smaller VMs. Here our cost is fixed. As long as we don’t need another power circuit (we can probably triple the number of servers before that becomes an issue) the cost is fixed. And if we need more power that’s not all that much per month to add on.

All and all, this will make a really nice resource for our customers to take advantage of, and give us a place to play with whatever we want without spending anything.

Ah, the life-long struggle between cap-x and op-x…

Comments closed