Author: Kevin Feasel

Help Make dbatools Even Better

Published 2018-10-10 by Kevin Feasel

Patrick Flynn wants you to improve dbatools:

As part of dbatools participation in this event we are encouraging contributors to assist “the road towards 1.0” by improving the examples available in the comment-based help, which power the new docs site at docs.dbatools.io.

The activity is available to anyone who wants to help and does not require any expertise in PowerShell. Any of the following actions are desirable:

Fix typos in examples

Fix obvious errors in examples

Add examples to illustrate use of all possible parameters

Add examples to illustrate use of pipeline support

Add examples to illustrate combining multiple dbatools commands.

Add examples that illustrate use of dbatools commands in new or interesting ways.

We are a looking for a max of 6-8 examples per command.

Patrick also shows you how easy it is to edit the documentation, so check that out and get contributing.

Comments closed

Image Clustering With Keras And R

Published 2018-10-09 by Kevin Feasel

Shirin Glander shows us how to use R to extract learned features from Keras and cluster those features:

For each of these images, I am running the predict() function of Keras with the VGG16 model. Because I excluded the last layers of the model, this function will not actually return any class predictions as it would normally do; instead we will get the output of the last layer: block5_pool (MaxPooling2D).

These, we can use as learned features (or abstractions) of the images. Running this part of the code takes several minutes, so I save the output to a RData file (because I samples randomly, the classes you see below might not be the same as in the sample_fruits list above).

Read the whole thing.

Comments closed

Using plm To Analyze Panel Data

Published 2018-10-09 by Kevin Feasel

Michael Grogan shows us how to use the plm package to perform linear regression against panel data:

Types of data

Cross-Sectional: Data collected at one particular point in time

Time Series: Data collected across several time periods

Panel Data: A mixture of both cross-sectional and time series data, i.e. collected at a particular point in time and across several time periods

Fixed Effects: Effects that are independent of random disturbances, e.g. observations independent of time.

Random Effects: Effects that include random disturbances.

Let us see how we can use the plm library in R to account for fixed and random effects. There is a video tutorial link at the end of the post.

Read on for an example.

Comments closed

SSMS On A Diet

Published 2018-10-09 by Kevin Feasel

Brent Ozar is happy that SQL Server Management Studio has dropped a few pounds:

SSMS 17.9 on the left has Database Diagrams at the top.

SSMS 18.0 does not. Database Diagrams are simply gone. Hallelujah! For over a decade, people have repeatedly cursed SSMS as they’ve accidentally clicked on the very top item and tried to expand it. One of the least-used SSMS features had one of the top billings, and generated more swear words than database diagrams.

The good news continues when you right-click on a server, click Properties, and click Processors.

The comments show that not everyone is happy about this, but I do think it’s for the best—the database diagram tool hadn’t been updated in a long time and is missing many features that an ER tool needs. I’d rather use Visio (or a better tool).

Comments closed

Getting A Specific Rank In DAX

Published 2018-10-09 by Kevin Feasel

Marco Russo shows us how to get the Nth element in a list using DAX:

The complexity of the calculation is in the Nth-Product Name Single and Nth-Product Sales Amount Single measures. These two measures are identical. The only difference is the RETURN statement in the last line, which chooses the return value between the NthProduct and NthAmount variables.

Unfortunately, DAX does not offer a universal way to share the code generating tables between different measures. Analysis Services Tabular provides access to DETAILROWS as a workaround, but this feature cannot be defined in a Power BI or Power Pivot data model as of now.

Indeed, the code of the two measures is nearly identical.

Read on for code and explanation.

Comments closed

When Table Variables Have Realistic Estimates, Unrealistic Results May Occur

Published 2018-10-09 by Kevin Feasel

Milos Radivojevic wraps up a series on deferred compilation for table variables by looking at a hack which used to work but no longer does:

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times: CPU time = 31 ms, elapsed time = 197 ms.

However, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables.

This is an interesting article with workarounds and counter-workarounds to solve a nasty estimation problem.

Comments closed

Reading Excel Files In An Office-less World

Published 2018-10-09 by Kevin Feasel

Bill Fellows shows us how to read from an Excel file on a machine without Microsoft Office installed:

A common problem working with Excel data is Excel itself. Working with it programatically requires an installation of Office, and the resulting license cost, and once everything is set, you’re still working with COM objects which present its own set of challenges. If only there was a better way.

Enter, the better way – EPPlus. This is an open source library that wraps the OpenXml library which allows you to simply reference a DLL. No more installation hassles, no more licensing (LGPL) expense, just a simple reference you can package with your solutions.

Let’s look at an example.

Read on for the example. A couple alternatives I like are readxl and XLConnect in R.

Comments closed

Hortonworks And Cloudera To Merge

Published 2018-10-08 by Kevin Feasel

Ashley Stirrup analyzes the merger of the two largest Hadoop vendors:

Overall, this is great news for customers, the Hadoop ecosystem and the future of the market. Both company’s customers can now sleep at night knowing that the pace of innovation from Cloudera 2.0 will continue and accelerate. Combining the Cloudera and Hortonworks technologies means that instead of having to pick one stack or the other, now customers can have the best of both worlds. The statement from their press release “From the Edge to AI” really sums up how complementary some of the investments that Hortonworks made in IoT complement Cloudera’s investments in machine learning. From an ecosystem and innovation perspective, we’ll see fewer competing Apache projects with much stronger investments. This can only mean better experiences for any user of big data open source technologies.

At the same time, it’s no secret how much our world is changing with innovation coming in so many shapes and sizes. This is the world that Cloudera 2.0 must navigate. Today, winning in the cloud is quite simply a matter of survival. That is just as true for the new Cloudera as it is for every single company in every industry in the world. The difference is that Cloudera will be competing with a wide range of cloud-native companies both big and small that are experiencing explosive growth. Carving out their place in this emerging world will be critical.

The company has so many of the right pieces including connectivity, computing, and machine learning. Their challenge will be, making all of it simple to adopt in the cloud while continuing to generate business outcomes. Today we are seeing strong growth from cloud data warehouses like Amazon Redshift, Snowflake, Azure SQL Data Warehouse and Google Big Query. Apache Spark and service players like Databricks and Qubole are also seeing strong growth. Cloudera now has decisions to make on how they approach this ecosystem and they choose to compete with and who they choose to complement.

Rob Bearden on the Hortonworks side:

Cloudera has a like-minded approach to next generation data management and analytics solutions for hybrid deployments. Like Hortonworks, Cloudera believes data can drive high velocity business model transformations, and has innovated in ways that benefit the market and create new revenue opportunities. We are confident that our combined company will be ideally positioned to redefine the future of data as we extend our leadership and expand our offerings.

This transformational event will create benefits and growth opportunities for our stakeholders. Together with Cloudera, we will accelerate market development, fuel innovation and produce substantial benefits for our customers, partners, employees and the community.

By merging Cloudera’s investments in data warehousing and machine learning with Hortonworks’ investments in end-to-end data management, we are generating a winning combination, which will establish the standard for hybrid cloud data management.

Mike Olson on the Cloudera side:

We’re announcing the combination today, but we don’t expect the deal to close for several months. We’ll undergo the normal regulatory review that any merger of scale involving public companies gets, and the shareholders from both companies will have to meet and approve the deal.

Between now and the close date, we remain independent companies. Our customers are running our respective products. Our sales teams are working separate from each other with current and new customers to win more business and to make those customers successful. We’ll both continue to do that.

Customers who are running CDH, HDP and HDF are getting a new promise. Those product lines will each be supported and maintained for at least three years from the date our merger closes. Any customer who chooses either can be sure of a long-term future for the platform selected.

Guy Shilo isn’t quite as pleased:

On the business side, the new company will be a de facto monopoly, as those two are the largest Hadoop vendors in terms of market share. Less competition often leads to lack of incentive to innovate and rising prices. Let’s hope the joint company will not go this way and leverage its funds and power to improving their products and services.

On the technological side, it will be interesting to see the way CDH and HDP will go. Will they keep both products alive ? will they continue only one ? which of them will it be ? Will they take the HortonWorks approach that embraces the Hadoop open source community and its fast changing versions or the Cloudera more conservative approach ?

I am cautiously pessimistic about this. Cloudera and Hortonworks combined for a huge amount of the Hadoop market (approximately 80% as of a couple of years ago). There are several competitors in the broader market, but I thought that Cloudera and Hortonworks gave us two separate visions for different types of companies.

Comments closed

Analyzing Update Dates For R Packages

Published 2018-10-08 by Kevin Feasel

Tomaz Kastrun takes a look at CRAN package update dates:

So more updates are coming in autumn times. But the results of correlation:
cor(dd_ym2010)[2,3]
is still just 0.155, making it hard to draw any concrete conclusions. Adding year 2018 will skew the picture and add several outliers, as the fact that year 2018 is still a running year (as of writing this blog post).

Read on for a descriptive analysis of this data set.

Comments closed

Missing Backup Directory When Trying To Upgrade SQL Server

Published 2018-10-08 by Kevin Feasel

Lori Brown walks us through the solution to an error she experienced:

I was recently performing an in-place upgrade of SQL 2008 R2 to SQL 2014 on one of my client’s servers. I have done a ton of successful and uneventful in-place upgrades and was surprised when the upgrade failed with the error message: “Failed to create a new folder ‘X:\SQLBackups’. The specified path is invalid (for example, it is on an unmapped drive).” This client had over the years changed from using a local drive for all backups to having backups sent to a network share. So, the X drive really was no longer in existence.

Read on for the solution.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31