Ingesting Multiple Data Sources With NiFi And MiniFi

Tim Spann shows how to collect data from multiple IoT devices using MiniFi and send it to a NiFi host:

So I designed my MiniFi flow in the Apache NiFi UI (pretty soon there will be a special designer for this). You then highlight everything there and hit ‘Create Template.’ You can then export it and convert it to config.yml. Again, this process will be automated and connected with the NiFi Registry very shortly to reduce the amount of clicking.

This is an example. When you connect to it in your flow you design it in Apache NiFi UI, you will connect to this port on the Remote Processor Group. If you are manually editing one (okay never do this, but sometimes I have to), you can copy that ID from this Port Details and past it in the file.

I like this as an overview of NiFi’s capabilities and a sneak peek at where they’re going.

debugr: Debugging In R

Kevin Feasel

2018-07-31

R

Joachim Zuckarelli announces a new R package, debugr:

debugr is a new package designed to support debugging in R. It mainly provides the dwatch() function which prints a debug output to the console or to a file. A debug output can consist of a static text message, the values of one or more objects (potentially transformed by applying some functions) or the value of one or multiple (more complex) R expressions.

Whether or not a debug message is displayed can be made dependent on the evaluation of a criterion phrased as an R expression. Generally, debug messages are only shown if the debug mode is activated. The debug mode is activated and deactivated with debugr_switchOn() and debugr_switchOff(), respectively, which change the logical debugr.active value in the global options. Since debug messages are only displayed in debug mode, the dwatch() function calls can even remain in the original code as they remain silent and won’t have any effect until the debug mode is switched on again.

Click through for links to additional resources.  It looks like an interesting way of tracing problems in more error-prone segments of code.  H/T R-Bloggers

What Is R?

Kevin Feasel

2018-07-31

R

Dave Mason has started a new blog and hits the heavy topic first:

For anyone that has no idea what R is, comparisons to scripting languages like PowerShell, javascript, vbscript, or even DOS batch/cmd files might be helpful. I feel there are enough commonalities, at least conceptually at a high level, for the comparison to be appropriate. We’ve already seen some differences, though. The <- assignment operator sure is weird. I recall Oracle’s PL/SQL used := as an assignment operator. Almost all other languages I remember coding with use the near-universal = (equals sign). Using <- will take some time getting used to.

Those R variables used in this post are declared without a data type. But they do have underlying types, which I’ll cover in another post. If I remember correctly, javascript doesn’t have types–everything is an object (please leave a comment if this is wrong and I’ll correct the post later). Vbscript used “var”s for everything, although you could coerce data types with functions like CInt, CBool, etc.

The way I like to describe R is as two things:  first, it is a domain-specific language dedicated to statistical analysis; and second, that it is a functional programming language (though not a pure functional language).

Exploratory Time Series Analysis

The authors at Knoyd have a post on exploratory data analysis of a time series data set:

From the plot above we can clearly see that time-series has strong seasonal and trend components. To estimate the trend component we can use a function from the pandas library called rolling_mean and plot the results. If we want to make the plot more fancy and reusable for another time-series it is a good idea to make a function. We can call this function plot_moving_average.

The second part of the series promises to use Box-Jenkins to forecast future values.

Building An Azure VM With Powershell

Garry Bargsley shows us how to provision and build a VM in Azure using nothing but Powershell:

I spent the bulk of my day Wednesday going through the Prelab steps outlined in the lab.  I was extremely impressed by this lab and how every step was correct and accurate down to the letter.  Then the more I thought about it, the steps are built around using an Azure Virtual Machine.  With this you get a common machine, framework and steps to build around.  You do not have to worry about the users’ local settings or scenario.  You are starting from the exact same point of reference every time.  So that was fun to connect via SSH to a Linux machine and install SQL Server 2017 and Docker from the command line.  While I know it was easy because someone was telling me what to type, it was still fun to see how the other side (Linux People) live.

Today I was in an adventurous mood to try something new.  I had been wanting to put together a PowerShell script that would deploy an Azure Virtual Machine.  I started down the path a couple time and got stuck so I lost interest.  I thought this was the perfect opportunity to get over the hurdle and combine the Prelab steps in this lab with doing those steps with PowerShell.  So below you will find my first go at building an Azure Virtual Machine using PowerShell to replace the manual steps in the Prelab process.  Not that there was anything wrong with those steps, I just want to try and use a tool that I have been working to learn and use on a day to day basis.  Wish me luck.

Read on for a step-by-step guide.

Filtering Measures In Power BI

Marco Russo has a great post on filtering measures on Power BI dashboards:

Also consider the case of customer 19081. Even though it is only displayed in March, their Revenues YTD value is larger than Revenues. This is because the Revenues YTD measure considers the sum of previous months, even though Revenues may be lower than the threshold of 9,999.

Because the filter granularity is Year-Month-Customer, only the filtered combinations are also considered in the year total. This explains another unexpected result. The Revenues YTD computed in December is different from the one computed for the entire year – yet another unexpected behavior for a year-to-date calculation. At the month level, only customers with Revenues higher than 9,999 in December are considered, including all the months in their Revenues YTD calculation. However at the year level, all customers with revenue higher than 9,999 in at least one month are considered; their revenues for the entire year are summed to compute Revenues YTD regardless of the monthly filter applied to the Revenues measure.

Marco goes into detail regarding the nuances of filtering and also provides some good answers to common problems.

Finding And Fixing The N+1 Problem With ORMs

Richie Rump explains the N+1 problem with object-relational mappers and shows you how to avoid it with Entity Framework:

The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.

Whoops!

Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.

Read the comments for more answers on top of Richie’s.  My answer (only 70% tongue in cheek)?  Functional programming languages don’t require ORMs.

Watching Power BI Grow Up

Paul Turley argues that Power BI is getting to be a mature product:

In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data.  The larger dataset was in a Spark cluster, accessed using DirectQuery.  Aggregated tables were stored in the in-memory model using the new composite model feature.  As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details.  Composite models will allow mashing-up imported database and file-based data with an DirectQuery.

There are limits and complexities with these new features.  You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling.  With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice.  I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.

Click through for several other features which help convince Paul that Power BI is threatening Analysis Services for enterprise data analysis solutions.

Changing Highlight Color In SQL Operations Studio

Kendra Little shows how you can change the code highlight color in SQL Operations Studio:

One of my major uses of SQL Ops studio will be demonstrating code in webcasts and videos, so it’s important to me to be able to set a high-contrast highlight for lines of code.

SQL Ops Studio is based on VSCode, which is very flexible, so I suspected there was a way to do this already. Also, there is a lot of documentation out there on VSCode already, so I searched on “VSCode change highlight color” to help find my way to the solution.

But I figured that lots of folks starting out with SQL Ops Studio may not know that, and that this post might be a good introduction to how to change things like this – as well as how to find things by searching for “vscode”!

Good info.

Auditing xp_cmdshell Usage

Patrick Keisler shows how to create a SQL Server audit file for xp_cmdshell usage:

This article assumes you already have a basic understanding of SQL Server Audit, but if not, use this link to catch up on all the details.

Are you required to have xp_cmdshell enabled on one of your servers? If so, then setup a SQL Audit now to track its use. Never mind the implications of enabling xp_cmdshell, as a DBA you are responsible for what happens on your servers and tracking the use of xp_cmdshell should be a priority.

Some smart people will tell you to disable xp_cmdshell altogether, but I don’t like that advice at all.  Auditing usage can give you more peace of mind while not limiting your ability to use a valuable tool.

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031