Press "Enter" to skip to content

Month: June 2021

Building a Payoff Diagram in R

Holger von Jouanne-Diedrich builds out payoff diagrams:

Not many people understand the financial alchemy of modern financial investment vehicles, like hedge funds, that often use sophisticated trading strategies. But everybody understands the meaning of rising and falling markets. Why not simply translate one into the other?

If you want to get your hands on a simple R script that creates an easy-to-understand plot (a profit & loss profile or payoff diagram) out of any price series, read on!

Click through for several examples of code and financial instruments.

Leave a Comment

Embedding Power BI into Jupyter Notebooks

Dennes Torres takes a look at a new Power BI feature:

Microsoft recently announced the ability to include Power BI reports inside Jupyter notebooks. After overcoming the dazzle of this exciting feature, what comes to my mind is: “Why do we need this?”

I’m far from being a Jupyter notebook expert, but as far as I know, they are used for interactive analysis. Why, in the middle of an interactive analysis, would I need to get a Power BI Report?

Even if the Power BI Report is not exactly what I need, I could continue the analysis in Power BI. Why should I move it to Jupyter and make this kind of integration with an existing report?

Read on to see what you can do with it. As far as how you might be able to use it, that remains an open question.

Leave a Comment

Understanding Query Execution Time Statistics

Esat Erkec takes us through SET STATISTICS TIME ON:

The SET STATISTICS TIME ON statement returns a text report and this report includes how long it is taken by the query compilation and execution time of a query. To enable this option for any query we need to execute the SET STATISTICS TIME ON command before the execution of the query so that the execution time report will appear in the message of the query result panel until we turn off this option. All values of the report ​​are shown in milliseconds type and its syntax like as below:

Read on to see how you can use it, as well as things to keep in mind as you do.

Leave a Comment

Searching for Key Lookups

Grant Fritchey answers a question:

While teaching about Extended Events and Execution Plans last week, Jason, one of the people in the class, asked: Is there a way in Extended Events to find queries using a Key Lookup operation? Sadly, the answer is no. However, you can query the Execution Plans in cache or in the Query Store to find this. Thanks for the question Jason. Here’s your answer.

Read on to see how.

Leave a Comment

SQL Server 2016 Leaving Mainstream Support July 2021

Glenn Berry reminds us that time flies:

SQL Server 2016 falls out of Mainstream Support on July 13, 2021. What this means is that there won’t be any new Service Packs or Cumulative Updates released for SQL Server 2016 after that date. It is still in Extended Support until July 14th, 2026. While in Extended Support, there will still be security and critical functional updates, if any are needed. This post is about SQL Server 2016 falling out of Mainstream Support.

Read on for more information about what this means, as it’s not a situation to panic and immediately change everything.

Leave a Comment

Using Spark in CDP’s Operational Database Experience

Gokul Kamaraj, et al, take us through using Apache Spark in Cloudera Data Platform’s Operational Database Experience:

Apache Spark is a very popular analytics engine used for large-scale data processing. It is widely used for many big data applications and use cases. CDP Operational Database Experience Experience (COD) is a CDP Public Cloud service that lets you create and manage operational database instances and it is powered by Apache HBase and Apache Phoenix. 

To know more about Apache Spark in CDP and CDP Operational Database Experience, see Apache Spark Overview and CDP Operational Database Experience Overview.

Apache Spark enables you to connect directly to databases that support JDBC. When integrating Apache Spark with Apache Phoenix in COD, you can leverage capabilities provided by Apache Phoenix to save and query data across multiple worker nodes, and use SELECT columns and pushdown predicates for filtering. 

In this blog post, let us look at how you can read and write data to COD from Apache Spark. We are going to use an Operational Database COD instance and Apache Spark present in the Cloudera Data Engineering experience

Read on for the process.

Leave a Comment

Change Data Capture in Delta Lake

Surya Sai Turaga and John O’Dwyer take us through change data capture in Delta Lake:

Change data capture (CDC) is a use case that we see many customers implement in Databricks – you can check out our previous deep dive on the topic here. Typically we see CDC used in an ingestion to analytics architecture called the medallion architecture. The medallion architecture that takes raw data landed from source systems and refines the data through bronze, silver and gold tables. CDC and the medallion architecture provide multiple benefits to users since only changed or added data needs to be processed. In addition, the different tables in the architecture allow different personas, such as Data Scientists and BI Analysts, to use the correct up-to-date data for their needs. We are happy to announce the exciting new Change Data Feed (CDF) feature in Delta Lake that makes this architecture simpler to implement and the MERGE operation and log versioning of Delta Lake possible!

Read on to gain an understanding of how it works.

Leave a Comment

Changing Power BI Evaluation Container Numbers

Chris Webb shows how we can optimize the number of evaluation containers in Power BI:

Last week I showed how the new MaxEvaluationWorkingSetInMB registry setting could increase the performance of memory-hungry Power Query queries in Power BI Desktop. In this post I’ll show how the other new registry setting, ForegroundEvaluationContainerCount, can also help performance. Before I carry on I recommend you read the documentation on these new registry settings if you haven’t done so already.

To illustrate the effect of this setting I created ten identical Power Query queries feeding an Import mode dataset in a new .pbix file, each of which read data from the same 150MB CSV file, apply the a filter and then count the number of rows returned. 

I don’t think I like having to modify a registry setting each time; that’s leading me to believe I should rarely (or never) mess with this.

Leave a Comment

Persistent Computed Columns and Columnstore Indexes

Erik Darling found a way to do something interesting:

If you read the documentation for column store indexes, it says that column store indexes can’t be created on persisted computed columns.

And that’s true. If we step through this script, creating the column store index will fail.

But it turns out that if there’s a will, there’s a way. Even if this is something you shouldn’t wish to do because who knows what it will mess up.

Leave a Comment