Change Data Capture With Apache NiFi

Kevin Feasel

2016-09-15

ETL, Hadoop

Satish Bomma uses Apache NiFi to perform change data capture on a MySQL database:

The main things to configure is DBCPConnection Pool and Maximum-value Columns

Please choose this to be the date-time stamp column that could be a cumulative change-management column

This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination.

This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions.

That last paragraph in the snippet is key:  it’s not a true replacement for CDC-friendly products.  It is, however, a good example for showing how to use NiFi to connect to a relational database and pump data out of it.

Firewall Configuration With Powershell

Slava Murygin gives an introduction to firewall configuration using Powershell:

The Script has list of adjustable filters:
$Direction – Direction of firewall rule: Inbound or Outbound;
$Action – Action rule performs: Allow or Block;
$Enabled – Status of a rule: Enabled – True or False;
$RuleGroup – Group rule has been assigned. By default script uses “$Null” variable, which filters all rules without assigned group. However you can specify a group a name if necessary;
$DisplayName – Name of a rule. By default I use an expression “*SQL*” to search for rules which have word “SQL” in their name. To retrieve all rules us “*”. To retrieve any particular rule use rule name.

He looks at viewing rules as well as creating, modifying, and deleting them.

SARGable Predicates

Gail Shaw discusses what makes a particular predicate SARGable:

Any1 function on a column will prevent an index seek from happening, even if the function would not change the column’s value or the way the operator is applied, as seen in the above case. Zero added to an integer doesn’t change the value of the column, but is still sufficient to prevent an index seek operation from happening.

While I haven’t yet found any production code where the predicate is of the form ‘Column + 0’ = @Value’, I have seen many cases where there are less obvious cases of functions on columns that do nothing other than to prevent index seeks.

UPPER(Column) = UPPER(@Variable) in a case-insensitive database is one of them, RTRIM(COLUMN) = @Variable is another. SQL ignores trailing spaces when comparing strings.

This is a straightforward concept with significant performance implications.

Create An Azure SQL Database Instance From Powershell

Arun Sirpal walks through the steps of setting up an Azure SQL Database instance and database using Powershell:

What I have done here is hard-code three parameters ( database edition, start IP address and end IP address) which for my situation won’t change but I have given the ability to pass in the environment name, SQL Server name and database name.

So a prompt will be presented to the user – here you should enter the relevant details and click enter.

It’s not that difficult to do, and the scripts themselves are probably faster than fumbling around the UI.

Diagnosing And Solving A Performance Problem

Monica Rathbun had a major performance problem; this is how she solved it:

Symptoms:

  • Very High Disk Latency as high as 300,000 milliseconds (ms) is not unusual
  • Average: 900 – 15,000ms
  • Memory Pressure
  • Slow User Experience

Problem:

  • Bad hardware
  • Over-provisioned VM Hosts (what happens on one VM effects the other)
  • Old NetApp SAN
  • No infrastructure budget for new hardware

Challenge: Make the system viable with no hardware changes or tweaks

Those disk latencies are scary.  I like the systematic approach Monica takes, and the end result was very positive.

Observations On Azure SQL Data Warehouse

Jeffrey Verheul is running this month’s T-SQL Tuesday.  Here is his contribution:

A thing that can make migrations to the cloud a bit more difficult, is that Azure SQL databases are basically a contained datastore (you would call it a “contained database” when you run it on-premise). This means that you (by default) can’t connect from one database to the other. This could mean that you need to rewrite your applications or stored procedures, or maybe even redesign your entire database/application/domain model.

This also means that running a stored procedure from the Ola Hallengren’s maintenance solution can only be done on the specific database, and not from the master database like the on-premise version does. These small challenges can be overcome, but it does mean code-duplication in your databases because the maintenance procedures need to be deployed to every single database.

Read on for more observations regarding Azure SQL Data Warehouse.

Evaluating Monitoring Tools

Kevin Feasel

2016-09-15

Tools

Richard Douglas has a great post on things to think about when evaluating a monitoring tool:

A question that often comes up in meetings is, “What would success look like?” To me, it’s my favourite football team Spurs winning the English Premier League! This is never a popular answer to the person asking the question in the meeting, but generally raises a few smiles and lightens the mood. However, you’re more likely interested in monitoring software and what success means in that scenario. As I see it, success means finding an outcome that is beyond doubt. Now success could mean that the software you are evaluating is not as good as the current incumbent. That is a successful outcome. You have decided that you already own the best solution for you. Congratulations! It can also mean that a particular solution meets all of the criteria needed by your business in order for it to solve technical issues and to grow.

The advice is vendor-agnostic and is worth reading if you plan to evaluate monitoring tools anytime soon.

SSRS Mobile Reports And Data Types

Koen Verbeek runs into a problem with SSRS Mobile Reports:

The error message “The JSON SharedDataSet Table renderer cannot parse the supplied report” doesn’t exactly tell you what’s going on. Apparently it is having issues with the Location column, which is of the geography data type. If you remove it, the dataset will be imported in the mobile report editor. There’s no documentation of which data types are supported or not in the mobile reports. I included the column in the first place to find out if the Mobile Report Publisher could handle it and plot the data on a map. It seems not.

Example number 798 in the “Microsoft errors tend to tell you what caused the failure, but not what actually caused the failure” ongoing series.  Sure, the JSON SharedDataSet Table renderer blew up…but what does that have to do with me and how do I fix it?  I realize that good error messages can be difficult in complex software, but this one isn’t very helpful at all.

Error Handling Extended Event

Dave Mason shows how to use an Extended Event to capture error data:

Here’s an example for DBCC CHECKDB on a corrupt database. Remember from the last post that in this scenario, control never passes to the CATCH block. So we’ll need to check the Event Session data after END CATCH. You can also run this as a single batch in SSMS, but you’ll need a corrupt database to get similar results. As before, replace “2016” with your SPID.

There are a lot of working parts to this, so read the scripts carefully if you’re interested in implementing something similar yourself.

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930