February 2019 – Page 3

If I type the letter a into the R Script editor, my code completion options are acts, always, and, and as. Power BI’s editor is not offering any IntelliSense options from a Python or R dictionary. Instead, it’s pulling from the text already in the editor. Note the comment in Line 1 and the inclusion of words beginning with the letter a — always, and, acts, as.
By comparison, the DAX editor contains a detailed function list and helpful annotations for code completion. Can we get something similar for R and Python? Not exactly… But there’s a workaround that I’m almost embarrassed to suggest. If you are a user who codes directly into the script editor, the following hack could be helpful. If you use the option to Edit script in External IDE, keep doing that and ignore the following guidance.

As-is, this is worse than no IntelliSense because at least with no IntelliSense, it’ll never steal a mouse click or keystroke. I wouldn’t expect RStudio level quality out of the gate but unless I’m missing something, that’s pretty bad.

1 Comment

Copying Filestream Data Between Tables

Published 2019-02-26 by Kevin Feasel

Paul Randal takes us through some limitations on copying Filestream data between tables:

I was asked last week whether it’s possible to create a table with a FILESTREAM column and then populate that column by copying FILESTREAM files from another directory in the FILESTREAM data container.
The simple answer is no.

Paul explains why this isn’t possible and then gives you an alternative which does work.

Comments closed

Blaming the Right Cardinality Estimator

Published 2019-02-26 by Kevin Feasel

Arthur Daniels helps us figure out which of SQL Server’s cardinality estimators your query used:

SQL Server 2008 is reaching end of support this year, so upgrading your SQL Server might be on your mind. One of the big changes when you upgrade your SQL Servers is upgrading the compatibility level, which by default will upgrade the cardinality estimator (CE).
This can change query performance, for better or for worse. This post won’t focus on whether it’s good or bad, but instead I want to show you how you can check to see what CE was used by your queries.

It’s not a 100% guarantee, but generally I’ve found the new estimator to be superior.

Comments closed

Eye-Friendly Palettes

Published 2019-02-26 by Kevin Feasel

Shannon Holck has shared a Power BI theme using a color-safe and easy to view palette:

Edward Tufte recommended use of soft colors that do not tire the eyes. I’ve actually never read his books (yet), but a former boss of mine was a devout disciple and produced some beautifully soft color palettes.

Stephen Few, in “Show Me the Numbers,” reiterated Tufte’s color theories and recommended three sets of hues:

Light – for large shapes, e.g. bars
Medium – for small shapes, e.g. points
Dark/Bright – for calling attention to data

Click through for more including where you can get this Power BI theme. I’m not exactly the world’s biggest fan of the default palette so I’ll have to check this one out.

Comments closed

Conditional Formatting in Power BI

Published 2019-02-26 by Kevin Feasel

Reza Rad shows us a few ways to perform conditional formatting in Power BI:

I have given many presentations and talks about Data Visualization, and still, I am amazed by how many visualizations I see which is not following the basic rules. In this article, I want to focus on table visual. A table is a visual that most of us are using it on many occasions, in fact, many users, like to see the data in table format. However, a table can be visualized in a way that is not readable. In this article, I’m showing you the most common style of a table which many report developers use, and then challenge it with a better style. The mystery is of course in conditional formatting. Like all my other articles, this article is demonstrating this technique in Power BI. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.

Some of these formats are better than others, but you do have the power to do quite a bit with it in Power BI.

Comments closed

Creating Multi-Column Statistics From Missing Index DMVs

Published 2019-02-26 by Kevin Feasel

Max Vernon shows how you can use the missing index DMVs to find potential candidates for multi-column statistics:

SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of sys.dm_db_missing_index_groups, sys.dm_db_missing_index_details, etc. I’m not saying the missing indexes DMVs are a panacea that will enable you to fix every performance situation you run into, but they can be useful if you know where to look. This post doesn’t go into a lot of depth about how to use those DMVs for the purpose of actually creating indexes, however I will show you how you can create multi-column stats objects as an interim performance booster while evaluating the need for those indexes.

I’ve never had great luck with multi-column stats versus simply creating indexes but that could simply be a case of me doing it wrong.

1 Comment

Where Hadoop Is Going

Published 2019-02-25 by Kevin Feasel

Erik Krogen summarizes a recent Hadoop developer gathering at LinkedIn:

The day started with LinkedIn’s very own Jonathan Hung (left) and Anthony Hsu(right) discussing TensorFlow on YARN, or TonY, our home-grown and recently open-sourced solution for distributed deep learning via TensorFlow on top of YARN. They discussed its architecture and implementation, as well as future goals, such as support for additional runtimes like PyTorch. You can view their slides here and a recording of their presentation here.

Looks like there were several interesting talks and a lot of content showing where Hadoop will go over the next year or so.

Comments closed

Getting Started With Apache Flume

Published 2019-02-25 by Kevin Feasel

Mark Litwintschik takes us through installation and configuration of Apache Flume:

The following was run on a fresh Ubuntu 16.04.2 LTS installation. The machine I’m using has an Intel Core i5 4670K clocked at 3.4 GHz, 8 GB of RAM and 1 TB of mechanical storage capacity.
First I’ve setup a standalone Hadoop environment following the instructions from my Hadoop 3 installation guide. Below I’ve installed Kafkacat for feeding and reading off of Kafka, libsnappy as I’ll be using Snappy compression on the Kafka topics, Python, Screen for running applications in the background and Zookeeper which is used by Kafka for coordination.

From there, Mark has the configuration scripts and processes to get the entire pipeline built.

Comments closed

Parsing HL7 Messages With Python

Published 2019-02-25 by Kevin Feasel

Cristian Satnic has HL7 formatted messages in SQL Server and wishes to parse them using Python:

Each line in the HL7 message is called a segment and then each segment is split into individual fields by | (pipe) characters (typically). HL7 fields have well-defined names and meanings … for example in the example above PID-3 (the 3rd field in the PID segment where the identifier ‘PID’ is not counted) is 12001 and that represents the patient identifier.
For this particular project I’m working on we have HL7 messages stored in a SQL Server 2016 database table where each row in the table contains the raw HL7 2.x message in a particular column. I need to be able to intelligently filter over this HL7 data by looking at values in particular HL7 fields (as shown above). Since this HL7 data is stored in a varchar(MAX) column I could certainly attempt to play games using LIKE comparisons in SQL but that would not get me very far. SQL simply does not understand the complex structure of HL7 and I have no native SQL Server functions at my disposal that I could quickly use to parse this data and filter it.

Cristian has a Jupyter Notebook which takes us through the solution. With SQL Server 2017, there’s the possibility of solving this in a stored procedure using Machine Learning Services.

Comments closed

Testing Cosmos DB’s REST API

Published 2019-02-25 by Kevin Feasel

Hasan Savran shows how we can test Cosmos DB’s REST API using Postman:

You have many options to access to CosmosDB. Rest API is one of these options and it is the low level access way to Cosmos DB. You can customize all options of CosmosDB by using REST API. To customize the calls, and pass the required authorization information, you need to use http headers. There are many headers you can set depending on the operation you want to run in CosmosDB. I am going to cover only the required headers here.

In the following example, I am going to try to create a database in CosmosDB emulator by using the REST API. First let’s look at the required header fields for this request. These requirement applies to all other REST API calls too.

It’s a little more complicated than just posting to a URL and Hasan has you covered.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28

Month: February 2019

Power BI IntelliSense For Python and R

Copying Filestream Data Between Tables

Blaming the Right Cardinality Estimator

Eye-Friendly Palettes

Conditional Formatting in Power BI

Creating Multi-Column Statistics From Missing Index DMVs

Where Hadoop Is Going

Getting Started With Apache Flume

Parsing HL7 Messages With Python

Testing Cosmos DB’s REST API