As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.
The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.
I’m going to have to keep this in mind.
Which show you that the FLOAT had to be converted to VARCHAR. You can see the same thing if you try it with various versions of INT or DATE datatypes as well. Like I said earlier. No big deal with INT or even DATE. Those come back in a fairly expected format. (INTs look exactly the same and DATEs come back as ‘YYYY-MM-DD’). FLOAT and REAL however are floating point so they don’t always convert the same way. If you do the conversion deliberately you get this:
Understand your data types; otherwise, it might come back to hurt you later.
signalMode adds a twist. The behaviour described for the traffic light corresponds to a signal mode of 0, also known as a manual reset event. Here the event stays signalled irrespective of how many consumers pass through it (=successfully wait on it).
A signal mode of 1, however, turns it into an auto-reset event, where the act of successfully waiting on the event resets it to unsignalled. This is now more akin to a turnstile that only lets one person through after being signalled, e.g. by a scan of a valid transport pass or a button press by a security guard.
Interestingly, a event object is also sometimes known as a latch – that’s something to chew on for SQL Server folks. Don’t get hung up about who or what signals it; that is a separate issue altogether. Just keep in mind that the signal mode is a permanent attribute of the event – you construct it as manual reset or auto-reset. Full disclosure: there seems to be at least one more SignalMode (2, used by the related SOS_WaitableAddress), but let’s ignore it today.
This is part of a great series, and I hope Ewald keeps it up. I’d probably drop a few bucks on a cleaned up and edited version of his discussion of internals in an 80-page or so e-book.
Jay Kreps, the co-creator of Apache Kafka and CEO of Confluent, was one of the first big data architects to espouse an alternative to the Lambda architecture, which he did with his 2014 O’Reilly story “Questioning the Lambda Architecture.” While Kreps appreciated some aspects of the Lambda architecture—in particular how it deals with reprocessing data—he stated that the downside was just too great.
“The Lambda architecture says I have to have Hadoop and I have to have Storm and I’m going to implement everything in both places and keep them in sync. “I think that’s extremely hard to do,” Kreps tells Datanami. “I think one of the biggest things hurting stream processing is the amount of complexity that you have to incur to build something. That makes it slow to build applications that way, hard to roll them out, and hard to make them reliable enough to be a key part of the business.
I wonder if we’re seeing the next generation of Kimball v Inmon here, or if one will absolutely dominate.
As more settlements in Texas and France are impacted by severe flooding, this is a good time to thank the hydrologists at the NOAA who forecast river level rises in advance and give residents in affected areas time to move to higher ground. Along with topgraphic, rainfall, and weather data, monitoring stations maintained by NOAA and the USGS along rivers provide critical real-time information about river levels. NOAA scientists access this data using the dataRetrieval package for R, which they then incorporate into flood prediction models and use to generate animations like this one of the flood of the Delaware in February this year
Looks like I’ve got a new blog to follow…
Step 1: configure SSMS to only show file names on the tabs. Click Tools, Options, Text Editor, Editor Tab and Status Bar, and set all of the tab texts to false except file name. After all, not like all this stuff fits on the tab.
I definitely agree with step 1. You can try out steps 2 and 3 and see if they fit your workflow.
So today I went to run a SSIS package on my new laptop and bam, error message.
Microsoft.SqlServer.Dts.Runtime.DtsRuntimeException: The package failed to load due to error 0xC0011008 “Error loading from XML. No further detailed error information can be specified for this problem because no Events object was passed where detailed error information can be stored.”. This occurs when CPackage::LoadFromXML fails.
This feels like one of those types of errors that you spend 3 hours trying to figure out. Gotta love machine rebuild errors…
The log file viewer added in the Apache Storm 0.9.1 release made accessing Storm’s log files significantly easier, but in some cases still required examination individual log files one-by-one. In Storm 1.0 the UI now includes a powerful search feature that allows you to search a specific topology log file, or across all topology log files in the cluster, even archived files.
When performing a topology-wide search, the UI will search across all supervisor nodes for a match. The search results include a link to the matching log file, as well as host and port information that allow you quickly identify on which machine a specific log event occurred. This feature is particularly helpful when trying to track down when and where a particular error occurred.
The examples Taylor gives are all built around scaling. When you have dozens or hundreds of nodes, one-by-one solutions just don’t work.
There are probably very few cases for which this is technically a good idea (trying to be a featured author on JunkCharts might very well be one of those reasons). Nonetheless, there are at least a couple of requests for this floating around on stackoverflow; here and here for example. I struggled to find any satisfactory solutions that were in current working order (though perhaps my Google-fu has failed me).
Jonathan is rather against this idea, and it does seem like the answer is a hack. I suppose the real answer is “sometimes an image isn’t worth a thousand words.”
Note the “(SQL Server 2014)” beside the project name in Solution Explorer.
If I want to deploy this project to an SSIS Catalog on a SQL Server 2016 instance, I should update the project to SSIS 2016. How do I do this? In Solution Explorer, right-click the project name and click Properties
There are some nice screen shots to walk you through this. I’m happy that SSIS is moving in a multi-version direction. That makes it easier for me as a developer to upgrade my tools without needing three versions of Visual Studio (or SSDT).