ClickHouse v21.7 Release Notes

Release Date: 2021-07-09 // almost 3 years ago
  • Backward Incompatible Change

    • Improved performance of queries with explicitly defined large sets. Added compatibility setting legacy_column_name_of_tuple_literal. It makes sense to set it to true, while doing rolling update of cluster from version lower than 21.7 to any higher version. Otherwise distributed queries with explicitly defined sets at IN clause may fail during update. #25371 (Anton Popov).
    • πŸ‘ Forward/backward incompatible change of maximum buffer size in clickhouse-keeper (an experimental alternative to ZooKeeper). Better to do it now (before production), than later. #25421 (alesapin).

    πŸ†• New Feature

    • πŸ‘Œ Support configuration in YAML format as alternative to XML. This closes #3607. #21858 (BoloniniD).
    • πŸ“‡ Provides a way to restore replicated table when the data is (possibly) present, but the ZooKeeper metadata is lost. Resolves #13458. #13652 (Mike Kot).
    • Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting output_format_arrow_low_cardinality_as_dictionary. #24341 (Kruglov Pavel).
    • βž• Added support for Array type in dictionaries. #25119 (Maksim Kita).
    • βž• Added function bitPositionsToArray. Closes #23792. Author [Kevin Wan] (@MaxWk). #25394 (Maksim Kita).
    • βž• Added function dateName to return names like 'Friday' or 'April'. Author [Daniil Kondratyev] (@dankondr). #25372 (Maksim Kita).
    • βž• Add toJSONString function to serialize columns to their JSON representations. #25164 (Amos Bird).
    • 🌲 Now query_log has two new columns: initial_query_start_time, initial_query_start_time_microsecond that record the starting time of a distributed query if any. #25022 (Amos Bird).
    • βž• Add aggregate function segmentLengthSum. #24250 (flynn).
    • Add a new boolean setting prefer_global_in_and_join which defaults all IN/JOIN as GLOBAL IN/JOIN. #23434 (Amos Bird).
    • πŸ‘Œ Support ALTER DELETE queries for Join table engine. #23260 (foolchi).
    • βž• Add quantileBFloat16 aggregate function as well as the corresponding quantilesBFloat16 and medianBFloat16. It is very simple and fast quantile estimator with relative error not more than 0.390625%. This closes #16641. #23204 (Ivan Novitskiy).
    • Implement sequenceNextNode() function useful for flow analysis. #19766 (achimbab).

    Experimental Feature

    🐎 Performance Improvement

    • Added optimization that transforms some functions to reading of subcolumns to reduce amount of read data. E.g., statement col IS NULL is transformed to reading of subcolumn col.null. Optimization can be enabled by setting optimize_functions_to_subcolumns which is currently off by default. #24406 (Anton Popov).
    • πŸ‘ Rewrite more columns to possible alias expressions. This may enable better optimization, such as projections. #24405 (Amos Bird).
    • Index of type bloom_filter can be used for expressions with hasAny function with constant arrays. This closes: #24291. #24900 (Vasily Nemkov).
    • βž• Add exponential backoff to reschedule read attempt in case RabbitMQ queues are empty. (ClickHouse has support for importing data from RabbitMQ). Closes #24340. #24415 (Kseniia Sumarokova).

    πŸ‘Œ Improvement

    • πŸ”€ Allow to limit bandwidth for replication. Add two Replicated*MergeTree settings: max_replicated_fetches_network_bandwidth and max_replicated_sends_network_bandwidth which allows to limit maximum speed of replicated fetches/sends for table. Add two server-wide settings (in default user profile): max_replicated_fetches_network_bandwidth_for_server and max_replicated_sends_network_bandwidth_for_server which limit maximum speed of replication for all tables. The settings are not followed perfectly accurately. Turned off by default. Fixes #1821. #24573 (alesapin).
    • Resource constraints and isolation for ODBC and Library bridges. Use separate clickhouse-bridge group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova).
    • βž• Add standalone clickhouse-keeper symlink to the main clickhouse binary. Now it's possible to run coordination without the main clickhouse server. #24059 (alesapin).
    • πŸ›  Use global settings for query to VIEW. Fixed the behavior when queries to VIEW use local settings, that leads to errors if setting on CREATE VIEW and SELECT were different. As for now, VIEW won't use these modified settings, but you can still pass additional settings in SETTINGS section of CREATE VIEW query. Close #20551. #24095 (Vladimir).
    • 🚚 On server start, parts with incorrect partition ID would not be ever removed, but always detached. #25070. #25166 (Nikolai Kochetov).
    • ⏱ Increase size of background schedule pool to 128 (background_schedule_pool_size setting). It allows avoiding replication queue hung on slow zookeeper connection. #25072 (alesapin).
    • Add merge tree setting max_parts_to_merge_at_once which limits the number of parts that can be merged in the background at once. Doesn't affect OPTIMIZE FINAL query. Fixes #1820. #24496 (alesapin).
    • πŸ‘ Allow NOT IN operator to be used in partition pruning. #24894 (Amos Bird).
    • βœ… Recognize IPv4 addresses like 127.0.1.1 as local. This is controversial and closes #23504. Michael Filimonov will test this feature. #24316 (alexey-milovidov).
    • ClickHouse database created with MaterializeMySQL (it is an experimental feature) now contains all column comments from the MySQL database that materialized. #25199 (Storozhuk Kostiantyn).
    • Add settings (connection_auto_close/connection_max_tries/connection_pool_size) for MySQL storage engine. #24146 (Azat Khuzhin).
    • πŸ‘Œ Improve startup time of Distributed engine. #25663 (Azat Khuzhin).
    • πŸ‘Œ Improvement for Distributed tables. Drop replicas from dirname for internal_replication=true (allows INSERT into Distributed with cluster from any number of replicas, before only 15 replicas was supported, everything more will fail with ENAMETOOLONG while creating directory for async blocks). #25513 (Azat Khuzhin).
    • βž• Added support Interval type for LowCardinality. It is needed for intermediate values of some expressions. Closes #21730. #25410 (Vladimir).
    • βž• Add == operator on time conditions for sequenceMatch and sequenceCount functions. For eg: sequenceMatch('(?1)(?t==1)(?2)')(time, data = 1, data = 2). #25299 (Christophe Kalenzaga).
    • Add settings http_max_fields, http_max_field_name_size, http_max_field_value_size. #25296 (Ivan).
    • βž• Add support for function if with Decimal and Int types on its branches. This closes #20549. This closes #10142. #25283 (alexey-milovidov).
    • ⚑️ Update prompt in clickhouse-client and display a message when reconnecting. This closes #10577. #25281 (alexey-milovidov).
    • Correct memory tracking in aggregate function topK. This closes #25259. #25260 (alexey-milovidov).
    • πŸ›  Fix topLevelDomain for IDN hosts (i.e. example.Ρ€Ρ„), before it returns empty string for such hosts. #25103 (Azat Khuzhin).
    • Detect Linux kernel version at runtime (for worked nested epoll, that is required for async_socket_for_remote/use_hedged_requests, otherwise remote queries may stuck). #25067 (Azat Khuzhin).
    • For distributed query, when optimize_skip_unused_shards=1, allow to skip shard with condition like (sharding key) IN (one-element-tuple). (Tuples with many elements were supported. Tuple with single element did not work because it is parsed as literal). #24930 (Amos Bird).
    • πŸ‘Œ Improved log messages of S3 errors, no more double whitespaces in case of empty keys and buckets. #24897 (Vladimir Chebotarev).
    • Some queries require multi-pass semantic analysis. Try reusing built sets for IN in this case. #24874 (Amos Bird).
    • Respect max_distributed_connections for insert_distributed_sync (otherwise for huge clusters and sync insert it may run out of max_thread_pool_size). #24754 (Azat Khuzhin).
    • Avoid hiding errors like Limit for rows or bytes to read exceeded for scalar subqueries. #24545 (nvartolomei).
    • πŸ“œ Make String-to-Int parser stricter so that toInt64('+') will throw. #24475 (Amos Bird).
    • If SSD_CACHE is created with DDL query, it can be created only inside user_files directory. #24466 (Maksim Kita).
    • 0️⃣ PostgreSQL support for specifying non default schema for insert queries. Closes #24149. #24413 (Kseniia Sumarokova).
    • πŸ›  Fix IPv6 addresses resolving (i.e. fixes select * from remote('[::1]', system.one)). #24319 (Azat Khuzhin).
    • πŸ›  Fix trailing whitespaces in FROM clause with subqueries in multiline mode, and also changes the output of the queries slightly in a more human friendly way. #24151 (Azat Khuzhin).
    • Improvement for Distributed tables. Add ability to split distributed batch on failures (i.e. due to memory limits, corruptions), under distributed_directory_monitor_split_batch_on_failure (OFF by default). #23864 (Azat Khuzhin).
    • πŸ– Handle column name clashes for Join table engine. Closes #20309. #23769 (Vladimir).
    • Display progress for File table engine in clickhouse-local and on INSERT query in clickhouse-client when data is passed to stdin. Closes #18209. #23656 (Kseniia Sumarokova).
    • Bugfixes and improvements of clickhouse-copier. Allow to copy tables with different (but compatible schemas). Closes #9159. Added test to copy ReplacingMergeTree. Closes #22711. Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes #19384. Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by setting allow_to_copy_alias_and_materialized_columns property to true in task configuration. Closes #9177. Closes #11007. Closes #9514. Added a property allow_to_drop_target_partitions in task configuration to drop partition in original table before moving helping tables. Closes #20957. Get rid of OPTIMIZE DEDUPLICATE query. This hack was needed, because ALTER TABLE MOVE PARTITION was retried many times and plain MergeTree tables don't have deduplication. Closes #17966. Write progress to ZooKeeper node on path task_path + /status in JSON format. Closes #20955. Support for ReplicatedTables without arguments. Closes #24834 .#23518 (Nikita Mikhaylov).
    • βž• Added sleep with backoff between read retries from S3. #23461 (Vladimir Chebotarev).
    • πŸ‘ Respect insert_allow_materialized_columns (allows materialized columns) for INSERT into Distributed table. #23349 (Azat Khuzhin).
    • βž• Add ability to push down LIMIT for distributed queries. #23027 (Azat Khuzhin).
    • πŸ›  Fix zero-copy replication with several S3 volumes (Fixes #22679). #22864 (ianton-ru).
    • 🌲 Resolve the actual port number bound when a user requests any available port from the operating system to show it in the log message. #25569 (bnaecker).
    • πŸ›  Fixed case, when sometimes conversion of postgres arrays resulted in String data type, not n-dimensional array, because attndims works incorrectly in some cases. Closes #24804. #25538 (Kseniia Sumarokova).
    • πŸ›  Fix convertion of DateTime with timezone for MySQL, PostgreSQL, ODBC. Closes #5057. #25528 (Kseniia Sumarokova).
    • πŸ›  Distinguish KILL MUTATION for different tables (fixes unexpected Cancelled mutating parts error). #25025 (Azat Khuzhin).
    • πŸ‘ Allow to declare S3 disk at root of bucket (S3 virtual filesystem is an experimental feature under development). #24898 (Vladimir Chebotarev).
    • Enable reading of subcolumns (e.g. components of Tuples) for distributed tables. #24472 (Anton Popov).
    • A feature for MySQL compatibility protocol: make user function to return correct output. Closes #25697. #25697 (sundyli).

    πŸ› Bug Fix

    • πŸ‘Œ Improvement for backward compatibility. Use old modulo function version when used in partition key. Closes #23508. #24157 (Kseniia Sumarokova).
    • πŸ›  Fix extremely rare bug on low-memory servers which can lead to the inability to perform merges without restart. Possibly fixes #24603. #24872 (alesapin).
    • πŸ›  Fix extremely rare error Tagging already tagged part in replication queue during concurrent alter move/replace partition. Possibly fixes #22142. #24961 (alesapin).
    • πŸ›  Fix potential crash when calculating aggregate function states by aggregation of aggregate function states of other aggregate functions (not a practical use case). See #24523. #25015 (alexey-milovidov).
    • πŸ›  Fixed the behavior when query SYSTEM RESTART REPLICA or SYSTEM SYNC REPLICA does not finish. This was detected on server with extremely low amount of RAM. #24457 (Nikita Mikhaylov).
    • πŸ›  Fix bug which can lead to ZooKeeper client hung inside clickhouse-server. #24721 (alesapin).
    • πŸ–¨ If ZooKeeper connection was lost and replica was cloned after restoring the connection, its replication queue might contain outdated entries. Fixed failed assertion when replication queue contains intersecting virtual parts. It may rarely happen if some data part was lost. Print error in log instead of terminating. #24777 (tavplubix).
    • Fix lost WHERE condition in expression-push-down optimization of query plan (setting query_plan_filter_push_down = 1 by default). Fixes #25368. #25370 (Nikolai Kochetov).
    • Fix bug which can lead to intersecting parts after merges with TTL: Part all_40_40_0 is covered by all_40_40_1 but should be merged into all_40_41_1. This shouldn't happen often.. #25549 (alesapin).
    • πŸ›  On ZooKeeper connection loss ReplicatedMergeTree table might wait for background operations to complete before trying to reconnect. It's fixed, now background operations are stopped forcefully. #25306 (tavplubix).
    • πŸ›  Fix error Key expression contains comparison between inconvertible types for queries with ARRAY JOIN in case if array is used in primary key. Fixes #8247. #25546 (Anton Popov).
    • πŸ›  Fix wrong totals for query WITH TOTALS and WITH FILL. Fixes #20872. #25539 (Anton Popov).
    • πŸ›  Fix data race when querying system.clusters while reloading the cluster configuration at the same time. #25737 (Amos Bird).
    • πŸ›  Fixed No such file or directory error on moving Distributed table between databases. Fixes #24971. #25667 (tavplubix).
    • πŸ›  REPLACE PARTITION might be ignored in rare cases if the source partition was empty. It's fixed. Fixes #24869. #25665 (tavplubix).
    • πŸ›  Fixed a bug in Replicated database engine that might rarely cause some replica to skip enqueued DDL query. #24805 (tavplubix).
    • πŸ›  Fix null pointer dereference in EXPLAIN AST without query. #25631 (Nikolai Kochetov).
    • πŸ›  Fix waiting of automatic dropping of empty parts. It could lead to full filling of background pool and stuck of replication. #23315 (Anton Popov).
    • πŸ›  Fix restore of a table stored in S3 virtual filesystem (it is an experimental feature not ready for production). #25601 (ianton-ru).
    • πŸ›  Fix nullptr dereference in Arrow format when using Decimal256. Add Decimal256 support for Arrow format. #25531 (Kruglov Pavel).
    • πŸ›  Fix excessive underscore before the names of the preprocessed configuration files. #25431 (Vitaly Baranov).
    • A fix for clickhouse-copier tool: Fix segfault when sharding_key is absent in task config for copier. #25419 (Nikita Mikhaylov).
    • πŸ›  Fix REPLACE column transformer when used in DDL by correctly quoting the formated query. This fixes #23925. #25391 (Amos Bird).
    • πŸ›  Fix the possibility of non-deterministic behaviour of the quantileDeterministic function and similar. This closes #20480. #25313 (alexey-milovidov).
    • πŸ‘Œ Support SimpleAggregateFunction(LowCardinality) for SummingMergeTree. Fixes #25134. #25300 (Nikolai Kochetov).
    • πŸ›  Fix logical error with exception message "Cannot sum Array/Tuple in min/maxMap". #25298 (Kruglov Pavel).
    • πŸ›  Fix error Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t> for queries where LowCardinality argument was used for IN (this bug appeared in 21.6). Fixes #25187. #25290 (Nikolai Kochetov).
    • πŸ›  Fix incorrect behaviour of joinGetOrNull with not-nullable columns. This fixes #24261. #25288 (Amos Bird).
    • πŸ›  Fix incorrect behaviour and UBSan report in big integers. In previous versions CAST(1e19 AS UInt128) returned zero. #25279 (alexey-milovidov).
    • πŸ›  Fixed an error which occurred while inserting a subset of columns using CSVWithNames format. Fixes #25129. #25169 (Nikita Mikhaylov).
    • πŸ‘ Do not use table's projection for SELECT with FINAL. It is not supported yet. #25163 (Amos Bird).
    • πŸ›  Fix possible parts loss after updating up to 21.5 in case table used UUID in partition key. (It is not recommended to use UUID in partition key). Fixes #25070. #25127 (Nikolai Kochetov).
    • Fix crash in query with cross join and joined_subquery_requires_alias = 0. Fixes #24011. #25082 (Nikolai Kochetov).
    • πŸ›  Fix bug with constant maps in mapContains function that lead to error empty column was returned by function mapContains. Closes #25077. #25080 (Kruglov Pavel).
    • βœ‚ Remove possibility to create tables with columns referencing themselves like a UInt32 ALIAS a + 1 or b UInt32 MATERIALIZED b. Fixes #24910, #24292. #25059 (alesapin).
    • Fix wrong result when using aggregate projection with not empty GROUP BY key to execute query with GROUP BY by empty key. #25055 (Amos Bird).
    • πŸ›  Fix serialization of splitted nested messages in Protobuf format. This PR fixes #24647. #25000 (Vitaly Baranov).
    • πŸ›  Fix limit/offset settings for distributed queries (ignore on the remote nodes). #24940 (Azat Khuzhin).
    • πŸ›  Fix possible heap-buffer-overflow in Arrow format. #24922 (Kruglov Pavel).
    • πŸ›  Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). #24885 (Pavel Kovalenko).
    • πŸ›  Fix "Missing columns" exception when joining Distributed Materialized View. #24870 (Azat Khuzhin).
    • πŸ‘ Allow NULL values in postgresql compatibility protocol. Closes #22622. #24857 (Kseniia Sumarokova).
    • πŸ›  Fix bug when exception Mutation was killed can be thrown to the client on mutation wait when mutation not loaded into memory yet. #24809 (alesapin).
    • πŸ›  Fixed bug in deserialization of random generator state with might cause some data types such as AggregateFunction(groupArraySample(N), T)) to behave in a non-deterministic way. #24538 (tavplubix).
    • πŸ— Disallow building uniqXXXXStates of other aggregation states. #24523 (RaΓΊl MarΓ­n). Then allow it back by actually eliminating the root cause of the related issue. (alexey-milovidov).
    • πŸ›  Fix usage of tuples in CREATE .. AS SELECT queries. #24464 (Anton Popov).
    • πŸ›  Fix computation of total bytes in Buffer table. In current ClickHouse version total_writes.bytes counter decreases too much during the buffer flush. It leads to counter overflow and totalBytes return something around 17.44 EB some time after the flush. #24450 (DimasKovas).
    • πŸ›  Fix incorrect information about the monotonicity of toWeek function. This fixes #24422 . This bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/5212 , and was exposed later by smarter partition pruner. #24446 (Amos Bird).
    • πŸ›  When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. #24431 (Denis Glazachev).
    • πŸ›  In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes #23905. #24399 (Ivan).
    • πŸ›  Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. #24321 (Amos Bird).
    • πŸ›  Fixed a bug in moving Materialized View from Ordinary to Atomic database (RENAME TABLE query). Now inner table is moved to new database together with Materialized View. Fixes #23926. #24309 (tavplubix).
    • πŸ‘ Allow empty HTTP headers. Fixes #23901. #24285 (Ivan).
    • ⚑️ Correct processing of mutations (ALTER UPDATE/DELETE) in Memory tables. Closes #24274. #24275 (flynn).
    • πŸ‘‰ Make column LowCardinality property in JOIN output the same as in the input, close #23351, close #20315. #24061 (Vladimir).
    • A fix for Kafka tables. Fix the bug in failover behavior when Engine = Kafka was not able to start consumption if the same consumer had an empty assignment previously. Closes #21118. #21267 (filimonov).

    πŸ— Build/Testing/Packaging Improvement

    • βž• Add darwin-aarch64 (Mac M1 / Apple Silicon) builds in CI #25560 (Ivan) and put the links to the docs and website (alexey-milovidov).
    • βž• Adds cross-platform embedding of binary resources into executables. It works on Illumos. #25146 (bnaecker).
    • βž• Add join related options to stress tests to improve fuzzing. #25200 (Vladimir).
    • πŸ— Enable build with s3 module in osx #25217. #25218 (kevin wan).
    • βž• Add integration test cases to cover JDBC bridge. #25047 (Zhichun Wu).
    • πŸ”§ Integration tests configuration has special treatment for dictionaries. Removed remaining dictionaries manual setup. #24728 (Ilya Yatsishin).
    • βž• Add libfuzzer tests for YAMLParser class. #24480 (BoloniniD).
    • Ubuntu 20.04 is now used to run integration tests, docker-compose version used to run integration tests is updated to 1.28.2. Environment variables now take effect on docker-compose. Rework test_dictionaries_all_layouts_separate_sources to allow parallel run. #20393 (Ilya Yatsishin).
    • πŸ›  Fix TOCTOU error in installation script. #25277 (alexey-milovidov).