Press "Enter" to skip to content

Month: April 2020

Saving Graphics in R Across Multiple OSes

Colin Gillesipie takes us through exporting graphics in R and some of the cross-platform foibles you’ll find:

One of R’s outstanding features is that it is cross platform. You write R code and it magically works under Linux, Windows and Mac. Indeed, the above the code “runs” under all three operating systems. But does it produce the same graphic under each platform? Spoiler! None of the above functions produce identical output across OS’s. So for “same”, I going to take a lax view and I just want figures that look the same.

Read on to understand the differences and hopefully limit confusion around them.

Comments closed

Migrating to Azure with SQL Server Management Studio

Magi Naumova walks us through some options for migrating on-prem instances to Azure, all of which are available in SQL Server Management Studio:

The cases of migrating our database in Azure become more and more every day. Azure SQL Database is the flagship SaaS service Microsoft Provides for hosting a relational database. But no matter it is the same engine there are still many features not supported or with limited functionalities in Azure SQL DB comparing to on premises SQL Server versions. For example, all cross-database references are possible in on premises SQL Server databases but is not supported in Azure SQL Database.

If we could check in advance and plan our migration based on those checks it would be time and effort saving. This is what Migrate to Azure new SSMS features are built for.

Click through for the options, some of which are simply informational and some of which actually do the work.

Comments closed

Power BI & Disabling Export to Excel

Marc Lelijveld explains why you might not want to let users export to Excel:

Export to Excel is a feature in Excel which is available in Power BI for a very long time. It allows report users to export the data from a specific visual in the report to an editable Excel file. After exporting, they can do whatever they want. For example, sending the data to others via mail, transforming or manipulating the data, start building new reports based on the Excel file and many other things. The export option can be used by clicking the ellipsis on the right top of a visual (if the visual header is enabled).

If you have all export functionalities enabled, users can both export underlying data and summarized data. The difference is mainly raw data or only data as visible in the chart where you clicked the export button.

Read on to understand why this might not be an unalloyed good.

Comments closed

Hyperthreading and VMs

David Klee shares some thoughts on hyperthreading in virtual environments:

I recommend leaving the hyper-threaded logical cores enabled in the host BIOS, but not depending on them for performance gains. Hyperthreaded CPU cores, or logical cores, should not be factored into CPU overcommitment rations as if they are full processor cores.

Every task that is triggered inside a virtual machine must be scheduled to run on a physical compute resource. These scheduled tasks must be placed into a scheduling queue inside the hypervisor layer before it gets its time on the physical compute resource. If the hypervisor is overloaded, or if the vCPU scheduling queues are imbalanced from an incorrect vCPU configuration, these queues can grow, and the performance impact on the vCPU performance can suffer.

Click through for an explanation of hyperthreading and David’s guidance on the topic.

Comments closed

Power BI Warning Regarding “Store datasets in enhanced metadata format”

Imke Feldmann does not recommend turning on the “Store datasets in enhanced metadata format” setting in Power BI all willy-nilly:

Background

With the march release came function “Store datasets in enhanced metadata format”. With this feature turned on, Power BI data models will be stored in the same format than Analysis Services Tabular models. This means that they inherit the same amazing options, that this open-platform connectivity enables.

Limitations and their consequences

But with the current setup, you could end up with a non-working file which you would have to build up from scratch for many parts. So make sure to fully read the documentation . Now!

Read on to see what has Imke concerned.

Comments closed

Powershell 7 Pipeline Chain Operators

Patrick Gruenauer show off a pair of new operators in Powershell 7:

With PowerShell 7 new operators were introduced. We call them chain operators. Chain operators enables you to do something after doing something. They use the $? and $LASTEXITCODE variable to determine whether a command on the left hand of the pipe failed or succeded.

Let’s cover this topic by demonstrating some examples to fully understand the new pipeline technology.

This is definitely Bash-inspired and I’m happy they made this move.

Comments closed

Is Kafka a Database?

Kai Wähner asks a question I hadn’t thought about:

Can and should Apache Kafka replace a database? How long can and should I store data in Kafka? How can I query and process data in Kafka? These are common questions that come up more and more. Short answers like “Yes” or “It depends” are not good enough for you? Then this read is for you! This blog post explains the idea behind databases and different features like storage, queries, and transactions to evaluate when Kafka is a good fit and when it is not.

This is an interesting review of the Kafka ecosystem and shows that Apache Kafka really does blur the lines regarding what is a database.

Comments closed

Database Administration in Cloudera Data Platform

Gokul Kamaraj and Liliana Kadar walk through tools for the DBA in Cloudera Data Platform:

You can use Cloudera Manager to automate the process of upgrading the operational database in your Cloudera Data Platform-Data Center (CDP-DC). Upgrades are provided through releases or maintenance patches. Cloudera Manager installs the releases and/or patches and manages the configuration as well as the restart process.

If you are using CDP on a public cloud such as Amazon AWS, you have to create a new Data hub cluster to upgrade to the new versions of various components.  For more information about creating a new operational database Data hub cluster, see Getting Started with Operational Database on CDP

Cloudera’s offering is a cluster-based offering; upgrades and patches all span multiple nodes (servers) and installation, configuration, reboot are all automated, including rolling reboots where applicable.

Click through for a walkthrough of other tools for Hadoop DBAs.

Comments closed

Understanding Area Graphs

Mike Cisneros takes us through the proper usage of area graphs:

Area graphs can be effective for:

– Showing the rise and fall of various data series over time
– Conveying total amounts over time as well as some sub-categorical breakdowns (but only to a point)
– Emphasizing a part-to-whole relationship over time when one part is very large, or changes from being very large to very small
– Showing change over time in individual panels of a small multiple chart

Area graphs are not the ideal choice for:

– Data sets on scales that do not have a meaningful relationship to zero
– Showing several volatile data sets over time
– Showing fine differences in values

In this post, we’ll talk about how an area graph works, and some of the challenges to keep in mind when you are considering creating one.

Click through for a detailed analysis. I will rarely use area graphs, but in the right use case, they can add a strong visual dynamic to a report.

Comments closed