Chart Confusion with Labels

Mike Cisneros shows us an example where unexpected label values can throw off your readers:

The internet immediately latched onto the seemingly absurd collection of months portrayed in this chart. The bill, dating from June of 2019, included 13 prior months of usage from as early as August of 2016, as recently as March of 2019, and in a random order.

Soon, our non-U.S.-based friends pointed out that the dates made even less sense to them, as (of course) their convention is not to show dates in MM/YY format, but in YY/MM format.

And with this, the truth of the matter became obvious: the dates were in neither MM/YY format nor YY/MM format; they were in MM/DD format, and excluded labeling the year entirely. 

Even small things can make a difference in your ability to get the message across to users.

SnowflakeDB: A Review

Achilleus gives us an overview of SnowflakeDB:

There is no dark magic involved in improving the efficiency of your queries. Based on whom you ask this can be considered as a standout feature or a major hindrance but I am not a fan of tuning queries according to my workload as I feel the way data evolves so quickly in organizations. It becomes more tricky to play catch up to turn the all necessary knobs to make the query faster.

Snowflake claims they tune all the queries “automagically” via a dynamic query optimization engine. No need for any indexes, updating statistics, partition keys or pre-shard data for even distribution when you scale up. All of this will be done by their patent-pending dynamic optimization.

But I still feel snowflake can work on providing the necessary knobs for people who would like to tune their queries.

Read the whole thing. In short, it’s a technology worth looking at, but it’s not going to work perfectly in all cases.

Determining Your OS in Powershell

Patrick Gruenauer shows how you can determine your operating system in Powershell Core 6 and Powershell 7:

Recently, I discovered three PowerShell Core variables that could be very helpful when it comes to determining the operating system in PowerShell. PowerShell has become a cross-platfrom tool and can be installed on Linux or MacOs, too. So it could happen that you have to determine the operating system first before starting any other actions.

Click through to see what they are and how to use them.

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us:

We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table.

In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database using, as we mentioned above, the SQL Server Extensibility Framework.

Click through for the scenario in depth and how to use Java to tie together SQL Server and Kafka.

Oracle Data Guard on Azure

Kellyn Pot’vin-Gorman’s worlds continue to collide:

So, as most people know, I’m not a big fan of Oracle RAC, (Real Application Cluster).  My opinion was that it was often sold for use cases that it doesn’t serve, (such as HA) and the resource demands between the nodes, as well as what happens when a node is evicted to those that are left are not in the best interest for most use cases.  On the other hand, I LOVE Oracle Data Guard, active or standard, don’t matter, the product is great and it’s an awesome option for those migrating their Oracle databases to Azure VMs.

Read on to see what Oracle Data Guard is and where you might use it.

Refreshing Views After DDL Changes

Eduardo Pivaral shows how you can refresh the metadata for a view in SQL Server after one of its underlying tables or functions changes:

So we proceed to execute an alter view over the first view:

ALTER VIEW dbo.[vi_invoices_received_by]
AS
SELECT
ConfirmedReceivedBy
as [Received by],
COUNT(InvoiceID) as [# of Invoices],
CustomerID
FROM Sales.Invoices
GROUP BY ConfirmedReceivedBy, CustomerID;
GO

So we should now be able to see the updated column names on the second view, right? 

but if you execute the view again, you will obtain the same results as before:

Read on to see what’s gone wrong and how you can fix it.

AMD and Server CPUs

Glenn Berry has an interesting post on why he’s seriously considering recommending AMD CPUs to people:

AMD claims a 15% Instructions Per Clock (IPC) increase between the desktop Zen+ and Zen 2 generations, and we are likely to see a similar increase between the previous AMD EPYC 7001 “Naples” and the AMD EPYC 7002 series processors.

So far, we don’t know the official base and turbo clock speeds, but there was a recent leak of partial specifications and pricing by a European retailer that listed max boost clock speeds of up to 3.4 GHz. We won’t know the actual single-threaded performance of these processors until they have been released and benchmarked by neutral third-party testers. I am optimistic that they will have higher single-threaded CPU performance than Intel Cascade Lake-SP processors.

I’ve always had a soft spot in my heart for AMD, so I’d love to see them come through with a serious competitor to Intel in the server space, for nostalgic reasons but also to make price more competitive and to make Intel get back on its game.

Troubleshooting DSC in Push Refresh Mode

Jess Pomfret continues a series on Powershell Desired State Configuration:

One of the biggest obstacles people face when using DSC is the troubleshooting and reporting pieces. There are options here to integrate with third party tools to create a more polished enterprise solution, but if you’re going with just straight DSC you might feel it is lacking some in this area.

We do however have several tools available to troubleshoot issues with configurations or to monitor our nodes to determine whether they are still in the desired state. I’m specifically going to look at the options available if you’re using DSC in the Push refresh mode.

Click through for discussion of the PSDesiredStateConfiguration module as well as DSC-related event log messages.

Notebooks in Azure Databricks

Brad Llewellyn takes us through Azure Databricks notebooks:

Azure Databricks Notebooks support four programming languages, Python, Scala, SQL and R.  However, selecting a language in this drop-down doesn’t limit us to only using that language.  Instead, it makes the default language of the notebook.  Every code block in the notebook is run independently and we can manually specify the language for each code block.

Before we get to the actually coding, we need to attach our new notebook to an existing cluster.  As we said, Notebooks are nothing more than an interface for interactive code.  The processing is all done on the underlying cluster.

Read on to learn how Databricks uses the notebook metaphor heavily in how you interact with it.

Reading and Writing CSV Files with spark-dotnet

Ed Elliott continues a series on Spark for .NET:

How do you read and write CSV files using the dotnet driver for Apache Spark?

I have a runnable example here:
https://github.com/GoEddie/dotnet-spark-examples

Specifcally:
https://github.com/GoEddie/dotnet-spark-examples/tree/master/examples/split-csv

The quoted links will take you straight to the code, but click through to see Ed’s commentary.

Categories

July 2019
MTWTFSS
« Jun  
1234567
891011121314
15161718192021
22232425262728
293031