OSSのデフォルトのブロックサイズを調べたのでメモしておきます.
Hadoop
128MB
Data Blocks
HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 128 MB. Thus, an HDFS file is chopped up into 128 MB chunks, and if possible, each chunk will reside on a different DataNode.Apache Hadoop 3.0.0 – HDFS Architecture (Oct. 20, 2024 accessed)
Amazon Redshift
1MB
Typical database block sizes range from 2 KB to 32 KB. Amazon Redshift uses a block size of 1 MB, which is more efficient and further reduces the number of I/O requests needed to perform any database loading or other operations that are part of query run.
Columnar storage - Amazon Redshift (Oct. 20, 2024 accessed)
Snowflake
50MB-500MB (非圧縮時)
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is smaller because data is always stored compressed). Groups of rows in tables are mapped into individual micro-partitions, organized in a columnar fashion. This size and structure allows for extremely granular pruning of very large tables, which can be comprised of millions, or even hundreds of millions, of micro-partitions.
Micro-partitions & Data Clustering | Snowflake Documentation (Oct. 20, 2024 accessed)