Press "Enter" to skip to content

Day: June 4, 2020

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed

Understanding Scatterplots

Alex Velez describes the nature of the scatterplot:

A scatterplot is a niche chart, but it’s one of my favorites! If you are a statistician or work in a technical field, a scatterplot might be your go-to graph type. However, if you don’t perform a lot of statistical analysis, then these charts may be unfamiliar. Regardless of your current comfort level, scatterplots are extremely useful to focus on the relationship between two series—a scenario that is common in both technical and non-technical fields. Let’s explore some of the basics of scatterplots via an example; I’ll also cover tips for designing more effective ones and discuss common variations (bubble charts, connected scatterplots, etc.), too!

Read on for a good explanation of what scatterplots are, variants on the theme, and when they make sense to use.

Comments closed

Cassandra Monitoring and Data Modeling

Instaclustr has put up a couple interesting posts on Cassandra. First, Anup Shirolkar explains how we can monitor Cassandra installations:

Cassandra is developed in Java and is a JVM based system. Each Cassandra node runs a single Cassandra process. JVM based systems are enabled with JMX (Java Management Extensions) for monitoring and management. Cassandra exposes various metrics using MBeans which can be accessed through JMX. Cassandra monitoring tools are configured to scrape the metrics through JMX and then filter, aggregate, and render the metrics in the desired format. There are a few performance limitations in the JMX monitoring method, which are referred to later. 

The metrics management in Cassandra is performed using Dropwizard library. The metrics are collected per node in Cassandra. However, those can be aggregated by the monitoring system. 

On the development side, the Instaclustr team walks us through data modeling guidelines:

The ultimate goal of Cassandra data modeling and analysis is to develop a complete, well organized, and high performance Cassandra cluster. Following the five Cassandra data modeling best practices outlined will hopefully help you meet that goal:

1. Cassandra is not a relational database, don’t try to model it like one
2. Design your model to meet 3 fundamental goals for data distribution
3. Understand the importance of the Primary Key in the overall data structure 
4. Model around your queries but don’t forget about your data
5. Follow a six step structured approach to building your model. 

Because Cassandra uses a variant of SQL, it’s easy to forget that data is stored completely differently and that design decisions are quite different from what we see in the relational world.

Comments closed

Service Broker Conversations and Messages

Chris Johnson is working on a series on Service Broker fundamentals. This post covers conversations and messages:

First, each conversation has its own unique dialog_handle (not conversation handle, despite everything calling it a conversation from now on, score one for consistency Microsoft). We need to capture this handle in a UNIQUEIDENTIFIER variable, as we will need it later on to send messages across the conversation. In fact, the statement will error if you don’t supply a variable to capture the handle.

Second, we need to supply both FROM and TO services. These tell the conversation which service is the source and which is the target. Remember, each service is attached to a queue, and can have one or more contracts attached to it. The source service is a database object, but the target service is an NVARCHAR. This allows the target service to live outside the database, which is something that I will cover at some point in the Service Broker 201 series.

This is a nice explanation of the process, so if you aren’t particularly familiar with Service Broker, check out Chris’s series, starting with the first and second posts.

Comments closed

Setting Drive Allocation Unit Size using Powershell

Eric Cobb has a tiny script for us:

There seems to be some ongoing debate around whether or not formatting your data and log drives with 64KB allocation unit size even matters anymore. I would encourage you to do your own research to determine if you want to do this or not. My personal take on it is: if it doesn’t hurt, and it may help, and it only takes 2 seconds to click the “go” button on my PowerShell script, then I would rather go ahead and do it and not need it than not do it and wish I had later down the road.

I don’t have a strong opinion in that debate, myself, so I’ll just say that if you want to see how to do this in a couple lines of Powershell code, check out Eric’s post.

Comments closed

Generating SQL Server Data Tools Solutions from Templates

Sander Stad walks us through creating a template for building SSDT solutions:

Yes, templates. But how are we’re going to create a template for an SSDT solution in such a way that it can be reused?

That’s where the PowerShell module called “PSModuleDevelopment” comes in. PSModuleDevelopment is part of the PSFramework PowerShell module.

The PSModuleDevelopment module enables you to create templates for files but also entire directories. Using placeholders you can replace values in the template making is possible to have the name and other variables set.

This is where the SSDT template comes in. I have created a template for SSDT that containes two projects. One project is meant for the data model and the other project is meant for the unit tests.

Read the whole thing and check out Sander’s GitHub repo.

Comments closed

Proper Ways to Store Currency Data in SQL Server

Randolph West thinks about ways to store money values in SQL Server:

I completely agree with this statement. Never store values used in financial calculations as floating point values, because a floating point is an approximate representation of a decimal value, stored as binary. In most cases it is inaccurate as soon as you store it. You can read more in this excellent — if a little dry — technical paper.

With that out of the way, we get into an interesting discussion about the correct data type to store currency values.

Randolph states an argument around why DECIMAL(19,4) is fine. And it’s great for most cases, though the one “real” financial system I’ve worked with have money stored as integer types (with SQL Server, that’d be BIGINT) because of precision, especially when working with exchange rates. But for most cases—especially when you’re not building the system of record for financial transactions or accounts—I agree with Randolph that DECIMAL is fine. Dave Wentzel has a great comment explaining even further the reasoning behind integer values for certain monetary columns.

Comments closed

Alerting when Power BI Tenant Settings Change

Melissa Coates walks us through how to track Power BI tenant settings changes:

This post discusses two methods of receiving an alert when a tenant setting has changed in the Power BI Service: one using Cloud App Security, and the other using the M365 Security & Compliance Center.

Tenant settings in the Power BI Service are among the most important things to get right. Once they are set the way you want, the objective is to ensure all changes are well-controlled. Tip: Check section 10 in the Planning a Power BI Enterprise Deployment whitepaper—I included some tenant setting recommendations in the latest version.

Read on to see how.

Comments closed