Author: Kevin Feasel

Say It With Screenshots

Published 2016-05-17 by Kevin Feasel

Brent Ozar continues his series on interviewing tactics:

After writing about “For Technical Interviews, Don’t Ask Questions, Show Screenshots”, lots of folks asked what kinds of screenshots I’d show. Here’s this week’s example.

I show each screenshot on a projector (or shared desktop) to the candidate and say:

What’s this screen from?
What does the screen mean?
If it was a server you inherited from someone else, would there be any actions you’d take?
What questions might you want to ask before you take those actions?
Would there be any drawbacks to your actions?
What would be the benefits of your actions?

I have started to use this in interviews and I’m already loving it. I don’t want people to memorize minutia (“Name all of the policies available in Policy-Based Management”) but if I show a picture of the different policies, that should jog your memory on when you’ve used PBM to solve interesting problems.

Comments closed

Creating M Functions From Parameterized Queries

Published 2016-05-17 by Kevin Feasel

Chris Webb shows how to take a parameterized query in Power BI and create an M function from it:

All of these examples involve writing M code manually. The big change in the latest version of Power BI Desktop is that you can do the same thing using just the UI.

Let’s take the classic example of combining data from multiple Excel workbooks and update it to show how things work now.

Say you have a folder containing three Excel workbooks containing sales data for January, February and March and you want to load data from all three into a single table into Power BI. The first thing to do is to create a new parameter in Power BI Desktop that returns the filename, including path, of one of the Excel files. Call it ExcelFilePath and configure it as shown here:

This is pretty cool.

Comments closed

Configuring Apache Flink

Published 2016-05-17 by Kevin Feasel

Awanish at Edureka shows how to install and configure Apache Flink:

Apache Flink is an open source platform for distributed stream and batch data processing. It can run on Windows, Mac OS and Linux OS. In this blog post, let’s discuss how to set up Flink cluster locally. It is similar to Spark in many ways – it has APIs for Graph and Machine learning processing like Apache Spark – but Apache Flink and Apache Spark are not exactly the same.

To set up Flink cluster, you must have java 7.x or higher installed on your system. Since I have Hadoop-2.2.0 installed at my end on CentOS ( Linux ), I have downloaded Flink package which is compatible with Hadoop 2.x. Run below command to download Flink package.

Flink is another streaming system. Check out this SlideShare presentation to see the differences between Flink and Spark.

Comments closed

Creating An ETL Process In Powershell

Published 2016-05-17 by Kevin Feasel

Max Trinidad is building a Powershell-based solution for ETL from scratch:

So after the drive gets mapped to T: drive, we need to look and collect the type of logs we want to pull. In my scenario, I’m looking for all log labeled “*.Events.*.log.*”. One caveat discovered previously, these text logs file doesn’t contains servername information. But, No Problem! This is another opportunity to be creative with PowerShell.

Here we use the Cmdlet “Get-ChildItem” with the “Sort-Object” to sort the results by its object property “LastWriteTime“. You will file this property very useful later as you progress in our data collection process. This results set wil need to be stored in PowerShell Object

I’m interested in seeing where this goes, especially because my first choice for ETL would be SSIS with Biml.

Comments closed

Power Query For The Rest Of Us

Published 2016-05-16 by Kevin Feasel

Reza Rad talks about using Power Query in a distinctly non-BI fashion:

As an introduction to this series, I want to take you to the path that leads me to use Power Query here. You might be aware that I am teaching Power BI courses, and most of my courses are online and Live. This means that courses are not recorded videos, it is me on the other side of the line with full interactive audio and video experience with students with Go2Meeting application. Students connecting to me from other places in the world. So I do need an event date/time scheduler that I can announce date and time of the event in different time zones.

Fortunately there is a very good website that helps to find a date/time in different time zones. In this website I can set my input parameters as the date/time of my event locally (in my city), and name of the event, and duration.

In Part 2, Reza shows grouping and concatenation:

Now that I have values in multiple columns I can concatenate them all into one string with Table.ToList function which converts a table to List. This function can concatenate all columns of table into one column (because List is a single columned data structure).

The actual concatenation happens by Combiner function; Combiner.CombineTextByDelimiter(“, “) which concatenate values with a delimiter which I set to be comma. So here is the expression for my new custom column:

Part 3 is forthcoming and should wrap up this series.

Comments closed

Integrating Custom Data Sources Into Spark

Published 2016-05-16 by Kevin Feasel

Nicolas A Perez builds a custom Spark streaming data source:

We first receive the order ID and the total amount of the order, and then we receive the line items of the order. The first value is the item ID, the second is the order ID, (which matches the order ID value) and then the cost of the item. In this example, we have two orders. The first one has four items and the second one has only one item.

The idea is to hide all of this from our Spark application, so what it receives on the DStream is a complete order defined on a stream as follows:

Check out this practical application of Spark Streaming.

Comments closed

In-Memory OLTP Using Ignite

Published 2016-05-16 by Kevin Feasel

Babu Elumalai explains how to use Apache Ignite to build an in-memory OLTP system on top of Amazon’s DynamoDB:

Business users have been content to perform analytics on data collected in Amazon Redshift to spot trends. But recently, they have been asking AWS whether the latency can be reduced for real-time analysis. At the same time, they want to continue using the analytical tools they’re familiar with.

In this situation, we need a system that lets you capture the data stream in real time and use SQL to analyze it in real time.

In the earlier section, you learned how to build the pipeline to Amazon Redshift with Firehose and Lambda functions. The following illustration shows how to use Apache Spark Streaming on EMR to compute time window statistics from DynamoDB Streams. The computed data can be persisted to Amazon S3 and accessed with SparkSQL using Apache Zeppelin.

There are a lot of technologies at play here and it’s worth a perusal, even though I’m going to keep recommending that you use a relational database like SQL Server for OLTP work in all but the most extreme of circumstances.

Comments closed

Deep Learning

Published 2016-05-16 by Kevin Feasel

Pete Warden argues that deep learning is not just a fad:

This kind of attribution of an adjective to a subject is something an accurate parser can do automatically. Rather than laboriously going through just a hundred examples, it’s easy to set up the Parser McParseface and run through millions of sentences. The parser isn’t perfect, but at 94% accuracy on one metric, it’s pretty close to humans who get 96%.

Even better, having the computer do the heavy lifting means that it’s possible to explore many other relationships in the data, to uncover all sorts of unknown statistical relationships in the language we use. There’s bound to be other words that are skewed in similar or opposite ways to ‘bossy’, and I’d love to know what they are!

Looks like one more time sink for me… Check this out if you’re at all interested in parsers.

Comments closed

Grant View Definition

Published 2016-05-16 by Kevin Feasel

Matt Smith ran into an issue when trying to compare two databases using SQL Server Data Tools:

This appeared to work but didn’t display any results. When you look at the bottom status bar it reads

Comparison complete. No differences detected. Restricted comparison. See Error List for details.

I knew there we at least some differences. I then click on the Error List table below it which revealed

The reverse engineering operation cannot continue because you do not have View Definition permission on the ‘Warehouse’

The answer is pretty simple, so read on.

Comments closed

Deploying To Azure SQL Database

Published 2016-05-16 by Kevin Feasel

Julie Smith shows how to deploy a database (AdventureWorksDW) out to Azure:

This is telling us that four of the tables in the sample do not have clustered indexes. Azure SQL Database insists on a clustered index for every table. So without warranty, here is a script that I used to refactor my on-prem AdventureWorksDW2014 database. After making these fixes, I was able to deploy to Azure SQL DB from SSMS with no errors.

Julie includes the script she used to fix up AdventureWorksDW.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30