Back in August of 2017, we wrote two posts #9: Compating your Share Libraries and #10: Compacting your Shared Libraries, After The Buildabout “stripping” shared libraries. This involves removing auxiliary information (such as debug symbols and more) from the shared libraries which can greatly reduce the installed size (on suitable platforms – it mostly matters where I work, i.e. on Linux).
There’s a pretty good space savings in the
tidyverse package. H/T R-Bloggers.
1. Cluster Size (N): Number of nodes/brokers in the Kafka cluster, we should have 2x+1, i.e. at least 3 nodes or more in an odd number.
2. Partitions: We write/publish data/event into a topic which is divided into partitions (by default 1), but we should have M times N, where M can be any integer number, i.e. M >= 1, to achieve more parallelism and partitioning of data over the cluster.
3.Replication Factor: determines the number of copies (including the original/Leader) of each partition in the cluster. All replicas of a partition exist on separate node/broker, and we should never have R.F. > N, but at least 3.
We recommend having 3 RF with 3 or 5 nodes cluster. This helps in having both availabilities as well as consistency.
Click through for several more tradeoff points.
JiraPS is a wonderful module that is great at a lot of things. There several great people that have put a lot of time into it for it to have such good feature coverage. It was designed to be approachable and do a lot of validation for you. JiraPS does not have a good user story around bulk operations with its issue commands and thats what I need from it the most.
It also has a large user base with hundreds of thousands of downloads and is used in a lot of organizations. Every change made to that module at this point needs to pay very close attention to backwards compatibility.
Now there are two modules depending on your use case.
There is a Jira ticket for the Apache Spark project, SPARK-27006. The gist of this ticket is to bring .NET support to Spark, specifically by supporting DataFrames in C# (and hopefully F#). No support for Datasets or RDDs is included in here, but giving .NET developers DataFrame access would make it easy for us to write code which interacts with Spark SQL and a good chunk of the SparkSession object.
You an click through and read everything I have to say, but do go to the Spark ticket and vote for .NET support.
There is really no point in adding another column
IDfor an individual row in this table, even if a lot of ORMs and non-ORM-defined schemas will do this, simply for “consistency” reasons (and in a few cases, because they cannot handle compound keys).
Now, the presence or absence of such a surrogate key is usually not too relevant in every day work with this table. If you’re using an ORM, it will likely make no difference to client code. If you’re using SQL, it definitely doesn’t. You just never use that additional column.
But in terms of performance, it might make a huge difference!
Lukas makes a good argument here.
If you have a large *.pbix file, you can investigate what are the columns and tables that causing the highest storage consumption, using Power BI Helper. You can download Power BI Helper for free from here. I opened the file above in Power BI Helper, and in the Modeling Advise tab, this is what I see:
As you can see in the above output, the Date field in the Date table is the biggest column in this dataset. taking 150MB runtime memory! This is considering that we have only three distinct values in the column! Seems a bit strange, isn’t? let’s dig into the reason more in deep.
Read on for Reza’s explanation and what you can do to fix it.
Heads up for SQL Server on Linux folks using availability groups and Pacemaker. Pacemaker 1.1.18 has been out for a while now, but it’s worth mentioning that there was a behaviour change in how it fails-over a cluster. While the new behaviour is considered “correct”, it may affect you if you’ve configured availability groups on a previous version (specifically 1.1.16).
Click through for more details and what you can do about this.
I was reading his latest blog post Using docker named volumes to persist databases in SQL Server and decided to give it a try.
His instructions worked perfectly and I thought I would try them using a docker-compose file as I like the ease of spinning up containers with them.
Read on for Rob’s travails, followed by great success. And never go into something named “the spooky basement;” that’s just good life advice.