R Internals: Data Sizes With Nullable Columns

Niels Berglund digs into the Binary Exchange Langage (BXL) and notices something weird about data sizes:

When looking at the data sent, the size of the packages and “drilling” into the TCP packets we could deduct that: :

  • Each column has an over-head of 32 bytes (at least for non nullable data)

  • The size of the column in one row is the size of the data type for numeric types.

  • For decimal and numeric an extra byte is added to each column, where this byte indicates the precision.

  • Columns of alpha numeric type all had 2 bytes pre-pended to the bytes, except max types.

  • For char and nchar the storage size was 2 bytes plus the size the column was defined as.

  • For varchar and nvarchar the storage size was 2 bytes plus the size of the data stored.

  • For the varmax data types the number of bytes that were pre-pended varied dependent on the data size.

Read the whole thing.

Related Posts

Calculating Lifetime Value With R

Sergey Bryl shows how to calculate the lifetime value of a subscription service: Predicting LTV is a common issue for a new, recently launched product/service/application when we don’t have a lot of historical data but want to calculate LTV as soon as possible. Even though we may have a lot of historical data on customer […]

Read More

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930