Press "Enter" to skip to content

Day: February 26, 2019

Selecting a List of Columns from Spark

Unmesha SreeVeni shows us how we can create a list of column names in Scala to pass into a Spark DataFrame’s select function:

Now our example dataframe is ready.
Create a List[String] with column names.
scala> var selectExpr : List[String] = List("Type","Item","Price") selectExpr: List[String] = List(Type, Item, Price)

Now our list of column names is also created.
Lets select these columns from our dataframe.
Use .head and .tail to select the whole values mentioned in the List()

Click through for a demo.

Comments closed

Microservice Communication Patterns

John Hammink shares a few ways that you can have microservices communicate with one another and argues that Kafka is a great platform for microservice communication:

Simply put, microservices are a software development method where applications are structured as loosely coupled services. The services themselves are minimal atomic units which together, comprise the entire functionality of the entire app. Whereas in an SOA, a single component service may combine one or several functions, a microservice within an MSA does one thing — only one thing — and does it well.

Microservices can be thought of as minimal units of functionality, can be deployed independently, are reusable, and communicate with each other via various network protocols like HTTP (More on that in a moment).

Read the whole thing. I have a love-hate relationship with these but it’s a pattern worth understanding.

Comments closed

Power BI IntelliSense For Python and R

David Eldersveld makes me wonder about the value of Power BI’s IntelliSense for R and Python:

If I type the letter into the R Script editor, my code completion options are actsalwaysand, and as. Power BI’s editor is not offering any IntelliSense options from a Python or R dictionary. Instead, it’s pulling from the text already in the editor. Note the comment in Line 1 and the inclusion of words beginning with the letter a — always, and, acts, as.

By comparison, the DAX editor contains a detailed function list and helpful annotations for code completion. Can we get something similar for R and Python? Not exactly… But there’s a workaround that I’m almost embarrassed to suggest. If you are a user who codes directly into the script editor, the following hack could be helpful. If you use the option to Edit script in External IDE, keep doing that and ignore the following guidance.

As-is, this is worse than no IntelliSense because at least with no IntelliSense, it’ll never steal a mouse click or keystroke. I wouldn’t expect RStudio level quality out of the gate but unless I’m missing something, that’s pretty bad.

1 Comment

Blaming the Right Cardinality Estimator

Arthur Daniels helps us figure out which of SQL Server’s cardinality estimators your query used:

SQL Server 2008 is reaching end of support this year, so upgrading your SQL Server might be on your mind. One of the big changes when you upgrade your SQL Servers is upgrading the compatibility level, which by default will upgrade the cardinality estimator (CE).

This can change query performance, for better or for worse. This post won’t focus on whether it’s good or bad, but instead I want to show you how you can check to see what CE was used by your queries.

It’s not a 100% guarantee, but generally I’ve found the new estimator to be superior.

Comments closed

Eye-Friendly Palettes

Shannon Holck has shared a Power BI theme using a color-safe and easy to view palette:

Edward Tufte recommended use of soft colors that do not tire the eyes.  I’ve actually never read his books (yet), but a former boss of mine was a devout disciple and produced some beautifully soft color palettes.

Stephen Few, in “Show Me the Numbers,” reiterated Tufte’s color theories and recommended three sets of hues:

Light – for large shapes, e.g. bars
Medium – for small shapes, e.g. points
Dark/Bright – for calling attention to data

Click through for more including where you can get this Power BI theme. I’m not exactly the world’s biggest fan of the default palette so I’ll have to check this one out.

Comments closed

Conditional Formatting in Power BI

Reza Rad shows us a few ways to perform conditional formatting in Power BI:

I have given many presentations and talks about Data Visualization, and still, I am amazed by how many visualizations I see which is not following the basic rules. In this article, I want to focus on table visual. A table is a visual that most of us are using it on many occasions, in fact, many users, like to see the data in table format. However, a table can be visualized in a way that is not readable. In this article, I’m showing you the most common style of a table which many report developers use, and then challenge it with a better style. The mystery is of course in conditional formatting. Like all my other articles, this article is demonstrating this technique in Power BI. If you like to learn more about Power BI, read Power BI book from Rookie to Rock Star.

Some of these formats are better than others, but you do have the power to do quite a bit with it in Power BI.

Comments closed

Creating Multi-Column Statistics From Missing Index DMVs

Max Vernon shows how you can use the missing index DMVs to find potential candidates for multi-column statistics:

SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of sys.dm_db_missing_index_groupssys.dm_db_missing_index_details, etc. I’m not saying the missing indexes DMVs are a panacea that will enable you to fix every performance situation you run into, but they can be useful if you know where to look. This post doesn’t go into a lot of depth about how to use those DMVs for the purpose of actually creating indexes, however I will show you how you can create multi-column stats objects as an interim performance booster while evaluating the need for those indexes.

I’ve never had great luck with multi-column stats versus simply creating indexes but that could simply be a case of me doing it wrong.

1 Comment