Press "Enter" to skip to content

Author: Kevin Feasel

Watch Those Parentheses

Kenneth Fisher shows how to see open and close parenthetical locations:

You’ll notice that when I go over the parentheses the one I’ve selected and it’s pair turn yellow, unless there isn’t a pair of course. You can also use Ctrl-] to flip between the open and close parenthesis in a pair. This can be particularly useful to make sure that you remembered a close parenthesis at the end of a subquery. In this case that last close parenthesis doesn’t have a match. Now finding out that you are missing an open parenthesis doesn’t mean you know where it’s supposed to go. But you can track the different pairs, making sure that each time you open a parenthesis you close it in the correct place. In this case it belonged right at the beginning.

FYI yellow isn’t the default (it’s a light gray). I find the default hard to see (I’m getting old) so I changed it to yellow in the options under fonts and colors.

Read the whole thing.

Comments closed

SQL Server Event Handling

Dave Mason looks at different levels of event handling within SQL Server:

While event handling for .Net developers is implemented in a unified way, this is not the case for SQL Server. Event handling for SQL Server lacks the “one stop shopping” afforded to .Net developers. *If* we had access to the code base for SQL Server and wanted to handle a specific event, we could add our own code, recompile sqlservr.exe, and be on our way. But since we don’t have this ability, we use SQL Server’s run-time hooks. Consider the following:

  • DDL Triggers: handles Data Definition Language events (synchronously).

  • Event Notifications: handles a wide swath of SQL Server events via Service Broker (asynchronously).

  • SQL Alerts: handles the following events:

    1. Events with a specific error number or severity level that are written to the Windows Event Log.

    2. Events for a specific performance condition.

    3. WMI events.

  • sp_procoption: handles the startup event by specifying a stored procedure to run when the database engine service starts.

  • SQL Agent jobs: handles time-based events defined by user-specified job schedules (ie daily, hourly).

This sounds like the beginning of a new series.

Comments closed

PDF Search With Page Numbers

Jon Morisi has a solution for how to get page numbers for results back from PDFs when using Full-Text Search:

In my last blog post, Setting up Full-Text Search for PDF files, I detailed how to get things setup.  If you tried this you may have noticed that although the searches worked, what you got back was a file name.  This isn’t so helpful if your document is an all encompassing 538 pages.  So, how do we get a page number back?  The best I’ve come up with so far is to split the 538 pages into 538 documents and load / search on those.

My first google search on how to split a pdf into pages came back with, http://www.splitpdf.com/, so I went ahead and used that.  I’m sure there is a way to do this through acrobat or even roll your own split functionality via the API.

It’s not a particularly pretty solution, but it does work, and that’s important.

Comments closed

Rprofile For Notifications

Steph Locke shows how to use .Rprofile to make your life easier:

First of all, you need a file called .Rprofile that’s stored in your working directory. Some useful resources about .Rprofiles can be found on .Rprofile CRAN docs and an .Rprofile intro.

Now inside that file, you can add a number of functions that are based on a number of events like loading or closing R. I need a .First function for on load and whatever I produce has to be able to print to the console with cat().

With that in mind, instead of showing details, I chose to show the number of breaches I’m in. You can get HIBPwned from CRAN and use it to query the awesome website HaveIBeenPwned.com.

I’ve seen people do things like this in .bash_profile, but didn’t know about .Rprofile before.

Comments closed

Spark Metrics

Swaroop Ramachandra looks at some key metrics for Spark administration:

Once you have identified and broken down the Spark and associated infrastructure and application components you want to monitor, you need to understand the metrics that you should really care about that affects the performance of your application as well as your infrastructure. Let’s dig deeper into some of the things you should care about monitoring.

  1. In Spark, it is well known that Memory related issues are typical if you haven’t paid attention to the memory usage when building your application. Make sure you track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Garbage collection stalls or abnormality in patterns can increase back pressure.

There are a few metrics of note here.  Check it out.

Comments closed

Preparing For Disaster Recovery

Kendra Little has a 30-minute video and explanation for how to prepare for a failover event:

The fact that you’re thinking about this is great!

You’re right, there are two major types of fail-overs that you have to think about:

  • Planned failover, when you can get to the original production system (at least for a short time)
  • Unplanned failover, when you cannot get to it

Even when you’re doing a planned failover, you don’t have time to go in and script out settings and jobs and logins and all that stuff.

Timing is of the essence, so you need minimal manual actions.

And you really should have documentation so that whomever is on call can perform the failover, even if they aren’t you.

The short answer is, test, test, test.  Test where it can’t hurt, and then test where it can.  But do read/watch the whole thing.

Comments closed

Monitoring Apache Spark

Swaroop Ramachandra has started a series on monitoring Apache Spark:

Spark provides metrics for each of the above components through different endpoints. For example, if you want to look at the Spark driver details, you need to know the exact URL, which keeps changing over time–Spark keeps you guessing on the URL. The typical problem is when you start your driver in cluster mode. How do you detect on which worker node the driver was started? Once there, how do you identify the port on which the Spark driver exposes its UI? This seems to be a common annoying issue for most developers and DevOps professionals who are managing Spark clusters. In fact, most end up running their driver in client mode as a workaround, so they have a fixed URL endpoint to look at. However, this is being done at the cost of losing failover protection for the driver. Your monitoring solution should be automatically able to figure out where the driver for your application is running, find out the port for the application and automatically configure itself to start collecting metrics.

For a dynamic infrastructure like Spark, your cluster can get resized on the fly. You must ensure your newly spawned components (Workers, executors) are automatically configured for monitoring. There is no room for manual intervention here. You shouldn’t miss out monitoring newer processes that show up on the cluster. On the other hand, you shouldn’t be generating false alerts when executors get moved around. A general monitoring solution will typically start alerting you if an executor gets killed and starts up on a new worker–this is because generic monitoring solutions just monitor your port to check if it’s up or down. With a real time streaming system like Spark, the core idea is that things can move around all the time.

Spark does add a bit of complexity to monitoring, but there are solutions in place.  Read the whole thing.

Comments closed

Detecting Web Traffic Anomalies

Jan Kunigk combines a few Apache products to perform near-real-time analysis of web traffic data:

meinestadt.de web servers generate up to 20 million user sessions per day, which can easily result in up to several thousand HTTP GET requests per second during peak times (and expected to scale to much higher volumes in the future). Although there is a permanent fraction of bad requests, at times the number of bad requests jumps.

The meinestadt.de approach is to use a Spark Streaming application to feed an Impala table every n minutes with the current counts of HTTP status codes within the n minutes window. Analysts and engineers query the table via standard BI tools to detect bad requests.

What follows is a fairly detailed architectural walkthrough as well as configuration and implementation work.  It’s a fairly long read, but if you’re interested in delving into Hadoop, it’s a good place to start.

Comments closed

New SSIS And SSRS Projects

Ginger Grant shows how to create a new SSIS or SSRS project for SQL Server 2016:

In this version of SQL Server Data Tools, Microsoft has finally addressed the common problem of needing to maintain multiple versions of SSIS packages for the different server versions. No longer do you need three different applications to maintain code for SQL Server 2012, 2014 and now 2016. All of these versions are supported with SSDT for Visual Studio 2015. SQL Server will detect which version the code was last saved in so that you don’t have to worry about accidently migrating code. You also have the ability to create an SSIS package in 2012, 2014 or 2016. To select the version you want, right click on the project and select Properties. Under Configuration Properties->General as shown in the picture, the TargetServerVersion, which defaults to SQL Server 2016, has a dropdown box making it possible to create a new package in Visual Studio 2015 for whatever version you need to support. Supporting the ability to write for different versions, is a great new feature and one which I am really happy is included in SSDT for Visual Studio 2015.

I’m also glad that Microsoft has made this move.  It is no fun having two or three different versions of Visual Studio installed because some component requires an older version.

Comments closed

Azure Cortana Intelligence Suite Walkthrough

Rolf Tesmer gives us a high-level walkthrough of the Azure Cortana Intelligence Suite, using management of a wind turbine farm as an example problem:

Event Hub

What is it

https://azure.microsoft.com/en-us/services/event-hubs/

Fully Managed Service (PaaS) for ingesting events/messages at a massive scale (think telemetry processing from websites, IoT etc).

What does it do in our wind farm

Provides a “front door” to our wind farm application to accept all of the streaming telemetry being generated from the turbines.  Event Hubs wont process any of this data per se – its just ensuring that its being accepted and queued (short term) while other components cane come in to consume it.

Before you dig deeply into particular services, it’s nice to see how they fit together at a higher level.

Comments closed