Cloudera Data Platform

Kevin Feasel

2019-01-14

Hadoop

Alex Woodie reports on the new Cloudera’s business plan:

“Once we’ve delivered that and got past it, we then want to get to a second subsequent version, which you can start to upgrade and migrate to, and that will be the go-forward platform,” he said. “Obviously the key part of CDP is delivering not just the workloads you have today but new and intuitive experiences around key workloads such as data warehousing, data flow, the edge or streaming, AI and machine learning.”
The company also announced that CDH 5.x and 6.x and HDP 3.x will be supported through January 2022, which is in-line with previous guidance the company has given. This company believes that three years is plenty of time for customers to plan their migration paths from older CDH and HDP versions to the unified CDP product. Support for HDP 2.x will end before that time.

Also of interest: the integration of Hortonworks Data Flow into CDH and Cloudera Data Science Workbench into HDP.

Converting Spark RDDs To DataFrames

Franco Cano shows us how we can convert a Resilient Distributed Dataset in Spark to a DataFrame:

Sometimes you need transform you RRDs to DataFrames because DataFrames have a lot optimization options.
Let’s see how this is done.

This works, but there is a performance cost to converting a large RDD to a DataFrame (or vice versa). With that in mind, sticking to one type when you can is typically better.

Iterative Solutions To The Closest Match Problem

Itzik Ben-Gan has a follow-up article looking at row-by-row solutions to the closest match problem:

Last month, I covered a puzzle involving matching each row from one table with the closest match from another table. I got this puzzle from Karen Ly, a Jr. Fixed Income Analyst at RBC. I covered two main relational solutions that combined the APPLY operator with TOP-based subqueries. Solution 1 always had quadratic scaling. Solution 2 did quite well when provided with good supporting indexes, but without those indexes also had quadric scaling. In this article I cover iterative solutions, which despite being generally frowned upon by SQL pros, do provide much better scaling in our case even without optimal indexing.

Itzik has three separate solutions here, including one using the CLR.

Azure Data Studio, January Release

Alan Yu announces the January release of Azure Data Studio:

In previous versions of Azure Data Studio, when a user ran large queries, no results would appear in the results grid until the query could show all of the results. This was not a great experience for our users, thus we did some investigating to improve this experience. In the latest build of Azure Data Studio, users can now see results streamed in the results grid. This makes it a better experience since users can see the results quicker and interact with their data instead of being in a waiting state.

There are several enhancements this month, including Azure Active Directory support.

Naming Conventions In SQL Server

Kevin Feasel

2019-01-14

Naming

Phil Factor explains naming requirements for SQL Server and gives suggestions for conventions to follow:

There are no generally accepted standards for naming SQL objects. Although ISO/IEC 11179 has been referred to as a standard for naming, it actually only sets a standard for defining naming conventions. There is a sample standard in the ‘Naming principles’ document (ISO/IEC 11179-5), but this is merely an example of how a standard should be defined. However, it is quite close to a general good-practice in programming.

When naming a table, it is a good idea to use a collective name or ‘object class term’ for the entity if one exists ( such as Employee, Cost, Tree, component, member, audience, staff or faculty) but use the singular rather than the plural form where possible. For the sake of maintenance, use a consistent naming convention that is informative but brief. It helps greatly to start with a dictionary of the correct nouns and verbs associated with the application domain and use that. If it proves inadequate, then the team can build on it. If a data model has been created as part of the design phase, this dictionary should be an end-product of this work.

As Phil notes at the end, consistency is the most important virtue here. It’s hard to work with a database where you have tables named Employees, employee_dates, and tblFiredEmployee.

When A Database In An AG Has Different Query Store Settings

Erin Stellato explains how we can have a discrepancy in Query Store settings between the primary and a secondary in an Availability Group:

Last week there was a question on #sqlhelp on Twitter about the status of Query Store for a database in an Availability Group. I’ve written about Query Store and Availability Groups before so if you’re not familiar with QS behavior in an AG, check out that post first. But this question was, I think, specific to the values that shows on a read-only replica and how there were different query store settings between a primary and secondary. Let’s set it up and take a look.

Click through to learn why this may be and why you shouldn’t panic.

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

Power BI Embedding Techniques

Marc Lelijveld takes us through the different ways we can embed Power BI dashboards:

In the end of 2018 Microsoft announced a great new feature which allows you to secure embed Power BI content to client applications or sites and keep security still in place. Actually, it almost works the same as the publish to web feature, but then users have to log-in in the embedded frame before they see the content.
Not only to the security to access the report is in place now, also all other security features will still work. This means that Row Level Security can be still in place in case when you are using secure embedding. A lot of people online are really enthusiastic about the new features. Simply because it gives you the ability to embed your content on your website, web portal or wherever you want, without risking security issues.

Click through for discussion on the four techniques.

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031