ClickHouse v22.3-lts Release Notes

Release Date: 2022-03-17 // about 2 years ago
  • Backward Incompatible Change

    • βͺ Make arrayCompact function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping arrayCompact arguments into arrayMap. Closes #34010 #18535 #14778. #34795 (Alexandre Snarskii).
    • πŸ”„ Change implementation specific behavior on overflow of function toDatetime. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. #32898 (HaiBo Li).
    • Make function cast(value, 'IPv4'), cast(value, 'IPv6') behave same as toIPv4, toIPv6 functions. Changed behavior of incorrect IP address passed into functions toIPv4,toIPv6, now if invalid IP address passes into this functions exception will be raised, before this function return default value. Added functions IPv4StringToNumOrDefault, IPv4StringToNumOrNull, IPv6StringToNumOrDefault, IPv6StringOrNull toIPv4OrDefault, toIPv4OrNull, toIPv6OrDefault, toIPv6OrNull. Functions IPv4StringToNumOrDefault, toIPv4OrDefault, toIPv6OrDefault should be used if previous logic relied on IPv4StringToNum, toIPv4, toIPv6 returning default value for invalid address. Added setting cast_ipv4_ipv6_default_on_conversion_error, if this setting enabled, then IP address conversion functions will behave as before. Closes #22825. Closes #5799. Closes #35156. #35240 (Maksim Kita).

    πŸ†• New Feature

    • πŸ‘Œ Support for caching data locally for remote filesystems. It can be enabled for s3 disks. Closes #28961. #33717 (Kseniia Sumarokova). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready.
    • βž• Add new table function hive. It can be used as follows hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>') for example SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day'). #34946 (lgbo).
    • πŸ‘Œ Support authentication of users connected via SSL by their X.509 certificate. #31484 (eungenue).
    • πŸ‘Œ Support schema inference for inserting into table functions file/hdfs/s3/url. #34732 (Kruglov Pavel).
    • Now you can read system.zookeeper table without restrictions on path or using like expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting allow_unrestricted_reads_from_keeper. #34609 (Sergei Trifonov).
    • Display CPU and memory metrics in clickhouse-local. Close #34545. #34605 (ζŽζ‰¬).
    • Implement startsWith and endsWith function for arrays, closes #33982. #34368 (usurai).
    • βž• Add three functions for Map data type: 1. mapReplace(map1, map2) - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. mapFilter 3. mapMap. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. #33698 (hexiaoting).
    • πŸ‘‰ Allow getting default user and password for clickhouse-client from the CLICKHOUSE_USER and CLICKHOUSE_PASSWORD environment variables. Close #34538. #34947 (DR).

    Experimental Feature

    • πŸ†• New data type Object(<schema_format>), which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g data.key1.key2 or with cast operator data.key1.key2::Int64.
    • Add database_replicated_allow_only_replicated_engine setting. When enabled, it only allowed to only create Replicated tables or tables with stateless engines in Replicated databases. #35214 (Nikolai Kochetov). Note that Replicated database is still an experimental feature.

    🐎 Performance Improvement

    • πŸ‘Œ Improve performance of insertion into MergeTree tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. #34750 (Maksim Kita).
    • Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes #34163. #34849 (Kseniia Sumarokova).
    • Columns pruning when reading Parquet, ORC and Arrow files from Hive. #34954 (lgbo).
    • 🐎 A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large IN section. Improve performance of direct dictionary if its source is ClickHouse. Improve performance of detectCharset, detectLanguageUnknown functions. #34888 (Maksim Kita).
    • πŸ‘Œ Improve performance of any aggregate function by using more batching. #34760 (RaΓΊl MarΓ­n).
    • 🐎 Multiple improvements for performance of clickhouse-keeper: less locking #35010 (zhanglistar), lower memory usage by streaming reading and writing of snapshot instead of full copy. #34584 (zhanglistar), optimizing compaction of log store in the RAFT implementation. #34534 (zhanglistar), versioning of the internal data structure #34486 (zhanglistar).

    πŸ‘Œ Improvement

    • πŸ‘ Allow asynchronous inserts to table functions. Fixes #34864. #34866 (Anton Popov).
    • Implicit type casting of the key argument for functions dictGetHierarchy, dictIsIn, dictGetChildren, dictGetDescendants. Closes #34970. #35027 (Maksim Kita).
    • EXPLAIN AST query can output AST in form of a graph in Graphviz format: EXPLAIN AST graph = 1 SELECT * FROM system.parts. #35173 (ζŽζ‰¬).
    • When large files were written with s3 table function or table engine, the content type on the files was mistakenly set to application/xml due to a bug in the AWS SDK. This closes #33964. #34433 (Alexey Milovidov).
    • πŸ”„ Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also SHOW CREATE ROW POLICY will always show AS permissive or AS restrictive in row policy's definition. #34596 (Vitaly Baranov).
    • πŸ‘Œ Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. #34465 (Kruglov Pavel).
    • πŸ’» Play UI now correctly detects the preferred light/dark theme from the OS. #35068 (peledni).
    • Added date_time_input_format = 'best_effort_us'. Closes #34799. #34982 (WenYao).
    • A new settings called allow_plaintext_password and allow_no_password are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. #34738 (Heena Bansal).
    • πŸ‘Œ Support for DateTime64 data type in Arrow format, closes #8280 and closes #28574. #34561 (ζŽζ‰¬).
    • Reload remote_url_allow_hosts (filtering of outgoing connections) on config update. #35294 (Nikolai Kochetov).
    • πŸ‘Œ Support --testmode parameter for clickhouse-local. This parameter enables interpretation of test hints that we use in functional tests. #35264 (Kseniia Sumarokova).
    • 🌲 Add distributed_depth to query log. It is like a more detailed variant of is_initial_query #35207 (ζŽζ‰¬).
    • Respect remote_url_allow_hosts for MySQL and PostgreSQL table functions. #35191 (Heena Bansal).
    • Added disk_name field to system.part_log. #35178 (Artyom Yurkov).
    • Do not retry non-rertiable errors when querying remote URLs. Closes #35161. #35172 (Kseniia Sumarokova).
    • Support distributed INSERT SELECT queries (the setting parallel_distributed_insert_select) table function view(). #35132 (Azat Khuzhin).
    • More precise memory tracking during INSERT into Buffer with AggregateFunction. #35072 (Azat Khuzhin).
    • 🐧 Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes #34787. #35032 (Alexey Milovidov).
    • βž• Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. #35004 (alesapin).
    • πŸ”Š Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes #34929. #34949 (Nikita Mikhaylov).
    • πŸ‘‰ Use connection pool for Hive metastore client. #34940 (lgbo).
    • πŸ”€ Ignore per-column TTL in CREATE TABLE AS if new table engine does not support it (i.e. if the engine is not of MergeTree family). #34938 (Azat Khuzhin).
    • Allow LowCardinality strings for ngrambf_v1/tokenbf_v1 indexes. Closes #21865. #34911 (Lars Hiller Eidnes).
    • πŸ‘ Allow opening empty sqlite db if the file doesn't exist. Closes #33367. #34907 (Kseniia Sumarokova).
    • Implement memory statistics for FreeBSD - this is required for max_server_memory_usage to work correctly. #34902 (Alexandre Snarskii).
    • In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes #34324. #34801 (Alexey Milovidov).
    • πŸ”€ Now ALTER TABLE DROP COLUMN columnX queries for MergeTree table engines will work instantly when columnX is an ALIAS column. Fixes #34660. #34786 (alesapin).
    • πŸ‘‰ Show hints when user mistyped the name of a data skipping index. Closes #29698. #34764 (flynn).
    • Support remote()/cluster() table functions for parallel_distributed_insert_select. #34728 (Azat Khuzhin).
    • πŸ”§ Do not reset logging that configured via --log-file/--errorlog-file command line options in case of empty configuration in the config file. #34718 (Amos Bird).
    • Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. #34684 (Kruglov Pavel).
    • πŸ‘ Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like Native, JSONEachRow. Closes #34604. #34653 (Maksim Kita).
    • MaterializedMySQL (experimental feature) now supports materialized_mysql_tables_list (a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list β€” means all the tables will be replicated), mentioned at #32977. #34487 (zzsmdfj).
    • πŸ‘Œ Improve OpenTelemetry span logs for INSERT operation on distributed table. #34480 (Frank Chen).
    • πŸ‘‰ Make the znode ctime and mtime consistent between servers in ClickHouse Keeper. #33441 (小路).

    πŸ— Build/Testing/Packaging Improvement

    • Package repository is migrated to JFrog Artifactory (Mikhail f. Shiryaev).
    • βœ… Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes #32268. #34092 (Kruglov Pavel).
    • ⬇️ Drop PVS-Studio from our CI. #34680 (Mikhail f. Shiryaev).
    • βž• Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. #35196 (alesapin).
    • πŸ— Smaller "fat-free" clickhouse-keeper build. #35031 (alesapin).
    • πŸ‘‰ Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. #34793 (Mikhail f. Shiryaev).
    • πŸ“œ Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. #34777 (Alexey Milovidov).
    • βœ‚ Remove clickhouse-test debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. #34606 (Ilya Yatsishin).

    πŸ› Bug Fix (user-visible misbehaviour in official stable or prestable release)

    • A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in HadoopSnappyDecoder will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place in HadoopSnappyDecoder::buffer. #35116 (lgbo).
    • πŸ›  Ignore obsolete grants in ATTACH GRANT statements. This PR fixes #34815. #34855 (Vitaly Baranov).
    • πŸ›  Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes #35312. #35313 (Kseniia Sumarokova).
    • πŸ›  Fix partial merge join duplicate rows bug, close #31009. #35311 (Vladimir C).
    • Fix possible Assertion 'position() != working_buffer.end()' failed while using bzip2 compression with small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35300 (Kruglov Pavel). While using lz4 compression with a small max_read_buffer_size setting value. #35296 (Kruglov Pavel). While using lzma compression with small max_read_buffer_size setting value. #35295 (Kruglov Pavel). While using brotli compression with a small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35281 (Kruglov Pavel).
    • πŸ›  Fix possible segfault in JSONEachRow schema inference. #35291 (Kruglov Pavel).
    • πŸ›  Fix CHECK TABLE query in case when sparse columns are enabled in table. #35274 (Anton Popov).
    • πŸ‘» Avoid std::terminate in case of exception in reading from remote VFS. #35257 (Azat Khuzhin).
    • πŸ›  Fix reading port from config, close #34776. #35193 (Vladimir C).
    • πŸ›  Fix error in query with WITH TOTALS in case if HAVING returned empty result. This fixes #33711. #35186 (Amos Bird).
    • πŸ›  Fix a corner case of replaceRegexpAll, close #35117. #35182 (Vladimir C).
    • Schema inference didn't work properly on case of INSERT INTO FUNCTION s3(...) FROM ..., it tried to read schema from s3 file instead of from select query. #35176 (Kruglov Pavel).
    • πŸ›  Fix MaterializedPostgreSQL (experimental feature) table overrides for partition by, etc. Closes #35048. #35162 (Kseniia Sumarokova).
    • πŸ›  Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes #33800. Closes #34922. Closes #34315. #35158 (Kseniia Sumarokova).
    • πŸ›  Fix partition pruning error when non-monotonic function is used with IN operator. This fixes #35136. #35146 (Amos Bird).
    • πŸ›  Fixed slightly incorrect translation of YAML configs to XML. #35135 (Miel Donkers).
    • Fix optimize_skip_unused_shards_rewrite_in for signed columns and negative values. #35134 (Azat Khuzhin).
    • ⚑️ The update_lag external dictionary configuration option was unusable showing the error message Unexpected key `update_lag` in dictionary source configuration. #35089 (Jason Chu).
    • Avoid possible deadlock on server shutdown. #35081 (Azat Khuzhin).
    • Fix missing alias after function is optimized to a subcolumn when setting optimize_functions_to_subcolumns is enabled. Closes #33798. #35079 (qieqieplus).
    • πŸ›  Fix reading from system.asynchronous_inserts table if there exists asynchronous insert into table function. #35050 (Anton Popov).
    • πŸ›  Fix possible exception Reading for MergeTree family tables must be done with last position boundary (relevant to operation on remote VFS). Closes #34979. #35001 (Kseniia Sumarokova).
    • πŸ›  Fix unexpected result when use -State type aggregate function in window frame. #34999 (metahys).
    • πŸ›  Fix possible segfault in FileLog (experimental feature). Closes #30749. #34996 (Kseniia Sumarokova).
    • πŸ›  Fix possible rare error Cannot push block to port which already has data. #34993 (Nikolai Kochetov).
    • πŸ›  Fix wrong schema inference for unquoted dates in CSV. Closes #34768. #34961 (Kruglov Pavel).
    • Integration with Hive: Fix unexpected result when use in in where in hive query. #34945 (lgbo).
    • Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. #34931 (Azat Khuzhin).
    • πŸ›  Fix DateTime64 conversion from PostgreSQL. Closes #33364. #34910 (Kseniia Sumarokova).
    • πŸ›  Fix possible "Part directory doesn't exist" during INSERT into MergeTree table backed by VFS over s3. #34876 (Azat Khuzhin).
    • πŸ‘Œ Support DDLs like CREATE USER to be executed on cross replicated cluster. #34860 (Jianmei Zhang).
    • πŸ›  Fix bugs for multiple columns group by in WindowView (experimental feature). #34859 (vxider).
    • πŸ›  Fix possible failures in S2 functions when queries contain const columns. #34745 (Bharat Nallan).
    • πŸ›  Fix bug for H3 funcs containing const columns which cause queries to fail. #34743 (Bharat Nallan).
    • Fix No such file or directory with enabled fsync_part_directory and vertical merge. #34739 (Azat Khuzhin).
    • πŸ›  Fix serialization/printing for system queries RELOAD MODEL, RELOAD FUNCTION, RESTART DISK when used ON CLUSTER. Closes #34514. #34696 (Maksim Kita).
    • Fix allow_experimental_projection_optimization with enable_global_with_statement (before it may lead to Stack size too large error in case of multiple expressions in WITH clause, and also it executes scalar subqueries again and again, so not it will be more optimal). #34650 (Azat Khuzhin).
    • ⚑️ Stop to select part for mutate when the other replica has already updated the transaction log for ReplatedMergeTree engine. #34633 (Jianmei Zhang).
    • πŸ›  Fix incorrect result of trivial count query when part movement feature is used #34089. #34385 (nvartolomei).
    • Fix inconsistency of max_query_size limitation in distributed subqueries. #34078 (Chao Ma).

Previous changes from v22.2

  • ⬆️ Upgrade Notes

    • Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting use_skip_indexes_if_final is introduced and disabled by default). #34243 (Azat Khuzhin).

    πŸ†• New Feature

    • Projections are production ready. Set allow_experimental_projection_optimization by default and deprecate this setting. #34456 (Nikolai Kochetov).
    • 0️⃣ An option to create a new files on insert for File/S3/HDFS engines. Allow to overwrite a file in HDFS. Throw an exception in attempt to overwrite a file in S3 by default. Throw an exception in attempt to append data to file in formats that have a suffix (and thus don't support appends, like Parquet, ORC). Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel).
    • βž• Add a setting that allows a user to provide own deduplication semantic in MergeTree/ReplicatedMergeTree If provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov).
    • βž• Add support of DEFAULT keyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi).
    • EPHEMERAL column specifier is added to CREATE TABLE query. Closes #9436. #34424 (yakov-olkhovskiy).
    • πŸ‘Œ Support IF EXISTS clause for TTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx' feature. Parts will be moved to disk or volume only if it exists on replica, so MOVE TTL rules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov).
    • πŸ‘ Allow set default table engine and to create tables without specifying ENGINE. #34187 (Ilya Yatsishin).
    • βž• Add table function format(format_name, data). #34125 (Kruglov Pavel).
    • Detect format in clickhouse-local by file name even in the case when it is passed to stdin. #33829 (Kruglov Pavel).
    • βž• Add schema inference for values table function. Closes #33811. #34017 (Kruglov Pavel).
    • Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis). #31257 (Filatenkov Artur).
    • Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
    • Fault-tolerant connections in clickhouse-client: clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4. #34490 (Kruglov Pavel). #33824 (Filippov Denis).
    • βž• Add DEGREES and RADIANS functions for MySQL compatibility. #33769 (Bharat Nallan).
    • βž• Add h3ToCenterChild function. #33313 (Bharat Nallan). Add new h3 miscellaneous functions: edgeLengthKm,exactEdgeLengthKm,exactEdgeLengthM,exactEdgeLengthRads,numHexagons. #33621 (Bharat Nallan).
    • βž• Add function bitSlice to extract bit subsequences from String/FixedString. #33360 (RogerYK).
    • βœ… Implemented meanZTest aggregate function. #33354 (achimbab).
    • βž• Add confidence intervals to T-tests aggregate functions. #33260 (achimbab).
    • βž• Add function addressToLineWithInlines. Close #26211. #33467 (SuperDJY).
    • βž• Added #! and # as a recognised start of a single line comment. Closes #34138. #34230 (Aaron Katz).

    Experimental Feature

    • πŸ‘€ Functions for text classification: language and charset detection. See #23271. #33314 (Nikolay Degterinsky).
    • Add memory overcommit to MemoryTracker. Added guaranteed settings for memory limits which represent soft memory limits. In case when hard memory limit is reached, MemoryTracker tries to cancel the most overcommited query. New setting memory_usage_overcommit_max_wait_microseconds specifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik).
    • Enable stream to table join in WindowView. #33729 (vxider).
    • πŸ‘Œ Support SET, YEAR, TIME and GEOMETRY data types in MaterializedMySQL (experimental feature). Fixes #18091, #21536, #26361. #33429 (zzsmdfj).
    • πŸ›  Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).

    🐎 Performance Improvement

    • Support optimize_read_in_order if prefix of sorting key is already sorted. E.g. if we have sorting key ORDER BY (a, b) in table and query with WHERE a = const ORDER BY b clauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov).
    • πŸ‘Œ Improve performance of partitioned insert into table functions URL, S3, File, HDFS. Closes #34348. #34510 (Maksim Kita).
    • 🐎 Multiple performance improvements of clickhouse-keeper. #34484 #34587 (zhanglistar).
    • 🐎 FlatDictionary improve performance of dictionary data load. #33871 (Maksim Kita).
    • πŸ‘Œ Improve performance of mapPopulateSeries function. Closes #33944. #34318 (Maksim Kita).
    • _file and _path virtual columns (in file-like table engines) are made LowCardinality - it will make queries for multiple files faster. Closes #34300. #34317 (flynn).
    • Speed up loading of data parts. It was not parallelized before: the setting part_loading_threads did not have effect. See #4699. #34310 (alexey-milovidov).
    • πŸ‘Œ Improve performance of LineAsString format. This closes #34303. #34306 (alexey-milovidov).
    • ⚑️ Optimize quantilesExact{Low,High} to use nth_element instead of sort. #34287 (Danila Kutenin).
    • 🐎 Slightly improve performance of Regexp format. #34202 (alexey-milovidov).
    • Minor improvement for analysis of scalar subqueries. #34128 (Federico Rodriguez).
    • πŸ‘‰ Make ORDER BY tuple almost as fast as ORDER BY columns. We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. #34060 (Amos Bird).
    • Rework and reintroduce the scalar subqueries cache to Materialized Views execution. #33958 (RaΓΊl MarΓ­n).
    • 🐎 Slightly improve performance of ORDER BY by adding x86-64 AVX-512 support for memcmpSmall functions to accelerate memory comparison. It works only if you compile ClickHouse by yourself. #33706 (hanqf-git).
    • πŸ‘Œ Improve range_hashed dictionary performance if for key there are a lot of intervals. Fixes #23821. #33516 (Maksim Kita).
    • πŸ”€ For inserts and merges into S3, write files in parallel whenever possible (TODO: check if it's merged). #33291 (Nikolai Kochetov).
    • πŸ‘Œ Improve clickhouse-keeper performance and fix several memory leaks in NuRaft library. #33329 (alesapin).

    πŸ‘Œ Improvement

    • πŸ‘Œ Support asynchronous inserts in clickhouse-client for queries with inlined data. #34267 (Anton Popov).
    • Functions dictGet, dictHas implicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita).
    • 🐎 Improvements for range_hashed dictionaries. Improve performance of load time if there are multiple attributes. Allow to create a dictionary without attributes. Added option to specify strategy when intervals start and end have Nullable type convert_null_range_bound_to_open by default is true. Closes #29791. Allow to specify Float, Decimal, DateTime64, Int128, Int256, UInt128, UInt256 as range types. RangeHashedDictionary added support for range values that extend Int64 type. Closes #28322. Added option range_lookup_strategy to specify range lookup type min, max by default is min . Closes #21647. Fixed allocated bytes calculations. Fixed type name in system.dictionaries in case of ComplexKeyHashedDictionary. #33927 (Maksim Kita).
    • πŸ›  flat, hashed, hashed_array dictionaries now support creating with empty attributes, with support of reading the keys and using dictHas. Fixes #33820. #33918 (Maksim Kita).
    • βž• Added support for DateTime64 data type in dictionaries. #33914 (Maksim Kita).
    • Allow to write s3(url, access_key_id, secret_access_key) (autodetect of data format and table structure, but with explicit credentials). #34503 (Kruglov Pavel).
    • βž• Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).
    • Send ProfileEvents statistics in case of INSERT SELECT query (to display query metrics in clickhouse-client for this type of queries). #34498 (Dmitry Novik).
    • Recognize .jsonl extension for JSONEachRow format. #34496 (Kruglov Pavel).
    • πŸ‘Œ Improve schema inference in clickhouse-local. Allow to write just clickhouse-local -q "select * from table" < data.format. #34495 (Kruglov Pavel).
    • Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on database.* as well as globally *.*. #34489 (Vitaly Baranov).
    • Allow to export arbitrary large files to s3. Add two new settings: s3_upload_part_size_multiply_factor and s3_upload_part_size_multiply_parts_count_threshold. Now each time s3_upload_part_size_multiply_parts_count_threshold uploaded to S3 from a single query s3_min_upload_part_size multiplied by s3_upload_part_size_multiply_factor. Fixes #34244. #34422 (alesapin).
    • πŸ‘ Allow to skip not found (404) URLs for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).
    • 0️⃣ Default input and output formats for clickhouse-local that can be overriden by --input-format and --output-format. Close #30631. #34352 (ζŽζ‰¬).
    • Add options for clickhouse-format. Which close #30528 - max_query_size - max_parser_depth. #34349 (ζŽζ‰¬).
    • πŸ‘ Better handling of pre-inputs before client start. This is for #34308. #34336 (Amos Bird).
    • REGEXP_MATCHES and REGEXP_REPLACE function aliases for compatibility with PostgreSQL. Close #30885. #34334 (ζŽζ‰¬).
    • Some servers expect a User-Agent header in their HTTP requests. A User-Agent header entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman).
    • πŸ”’ Cancel merges before acquiring table lock for TRUNCATE query to avoid DEADLOCK_AVOIDED error in some cases. Fixes #34302. #34304 (tavplubix).
    • πŸ”„ Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (alexey-milovidov).
    • βž• Add ability to compose PostgreSQL-style cast operator :: with expressions using [] and . operators (array and tuple indexing). #34229 (Nikolay Degterinsky).
    • πŸ“œ Recognize YYYYMMDD-hhmmss format in parseDateTimeBestEffort function. This closes #34206. #34208 (alexey-milovidov).
    • πŸ‘ Allow carriage return in the middle of the line while parsing by Regexp format. This closes #34200. #34205 (alexey-milovidov).
    • πŸ‘ Allow to parse dictionary's PRIMARY KEY as PRIMARY KEY (id, value); previously supported only PRIMARY KEY id, value. Closes #34135. #34141 (Maksim Kita).
    • An optional argument for splitByChar to limit the number of resulting elements. close #34081. #34140 (ζŽζ‰¬).
    • Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of #31123. #34114 (Amos Bird).
    • βž• Add UUID suport in MsgPack input/output format. #34065 (Kruglov Pavel).
    • πŸ“‡ Tracing context (for OpenTelemetry) is now propagated from GRPC client metadata (this change is relevant for GRPC client-server protocol). #34064 (andremarianiello).
    • πŸ‘Œ Supports all types of SYSTEM queries with ON CLUSTER clause. #34005 (小路).
    • Improve memory accounting for queries that are using less than max_untracker_memory. #34001 (Azat Khuzhin).
    • πŸ›  Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is ẞ and ß. This closes #7334. #33992 (Harry Lee).
    • Detect format and schema from stdin in clickhouse-local. #33960 (Kruglov Pavel).
    • Correctly handle the case of misconfiguration when multiple disks are using the same path on the filesystem. #29072. #33905 (zhongyuankai).
    • Try every resolved IP address while getting S3 proxy. S3 proxies are rarely used, mostly in Yandex Cloud. #33862 (Nikolai Kochetov).
    • πŸ‘Œ Support EXPLAIN AST CREATE FUNCTION query EXPLAIN AST CREATE FUNCTION mycast AS (n) -> cast(n as String) will return EXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String'). #33819 (ζŽζ‰¬).
    • βž• Added support for cast from Map(Key, Value) to Array(Tuple(Key, Value)). #33794 (Maksim Kita).
    • βž• Add some improvements and fixes for Bool data type. Fixes #33244. #33737 (Kruglov Pavel).
    • πŸ“œ Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).
    • πŸ‘Œ Improvement for fromUnixTimestamp64 family functions.. They now accept any integer value that can be converted to Int64. This closes: #14648. #33505 (Andrey Zvonov).
    • Reimplement _shard_num from constants (see #7624) with shardNum() function (seee #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin).
    • βœ… Enable binary arithmetic (plus, minus, multiply, division, least, greatest) between Decimal and Float. #33355 (flynn).
    • Respect cgroups limits in max_threads autodetection. #33342 (JaySon).
    • Add new clickhouse-keeper setting min_session_timeout_ms. Now clickhouse-keeper will determine client session timeout according to min_session_timeout_ms and session_timeout_ms settings. #33288 (JackyWoo).
    • βž• Added UUID data type support for functions hex and bin. #32170 (Frank Chen).
    • πŸ›  Fix reading of subcolumns with dots in their names. In particular fixed reading of Nested columns, if their element names contain dots (e.g Nested(`keys.name` String, `keys.id` UInt64, values UInt64)). #34228 (Anton Popov).
    • Fixes parallel_view_processing = 0 not working when inserting into a table using VALUES. - Fixes view_duration_ms in the query_views_log not being set correctly for materialized views. #34067 (RaΓΊl MarΓ­n).
    • πŸ›  Fix parsing tables structure from ZooKeeper: now metadata from ZooKeeper compared with local metadata in canonical form. It helps when canonical function names can change between ClickHouse versions. #33933 (sunny).
    • Properly escape some characters for interaction with LDAP. #33401 (IlyaTsoi).

    πŸ— Build/Testing/Packaging Improvement

    • βœ‚ Remove unbundled build support. #33690 (Azat Khuzhin).
    • βœ… Ensure that tests don't depend on the result of non-stable sorting of equal elements. Added equal items ranges randomization in debug after sort to prevent issues when we rely on equal items sort order. #34393 (Maksim Kita).
    • βž• Add verbosity to a style check. #34289 (Mikhail f. Shiryaev).
    • βœ‚ Remove clickhouse-test debian package because it's obsolete. #33948 (Ilya Yatsishin).
    • πŸ‘· Multiple improvements for build system to remove the possibility of occasionally using packages from the OS and to enforce hermetic builds. #33695 (Amos Bird).

    πŸ› Bug Fix (user-visible misbehaviour in official stable or prestable release)

    • Fixed the assertion in case of using allow_experimental_parallel_reading_from_replicas with max_parallel_replicas equals to 1. This fixes #34525. #34613 (Nikita Mikhaylov).
    • πŸ›  Fix rare bug while reading of empty arrays, which could lead to Data compressed with different methods error. It can reproduce if you have mostly empty arrays, but not always. And reading is performed in backward direction with ORDER BY ... DESC. This error is extremely unlikely to happen. #34327 (Anton Popov).
    • πŸ›  Fix wrong result of round/roundBankers if integer values of small types are rounded. Closes #33267. #34562 (ζŽζ‰¬).
    • πŸ›  Sometimes query cancellation did not work immediately when we were reading multiple files from s3 or HDFS. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
    • Fix exception Chunk should have AggregatedChunkInfo in MergingAggregatedTransform (in case of optimize_aggregation_in_order = 1 and distributed_aggregation_memory_efficient = 0). Fixes #34526. #34532 (Anton Popov).
    • πŸ›  Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
    • πŸ›  Fix compression support in URL engine. #34524 (Frank Chen).
    • πŸ›  Fix possible error 'file_size: Operation not supported' in files' schema autodetection. #34479 (Kruglov Pavel).
    • πŸ›  Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
    • πŸ›  Fix possible error Cannot convert column Function to mask in short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel).
    • πŸ›  Fix potential crash when doing schema inference from url source. Closes #34147. #34405 (Kruglov Pavel).
    • For UDFs access permissions were checked for database level instead of global level as it should be. Closes #34281. #34404 (Maksim Kita).
    • πŸ›  Fix wrong engine syntax in result of SHOW CREATE DATABASE query for databases with engine Memory. This closes #34335. #34345 (alexey-milovidov).
    • πŸ›  Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (tavplubix).
    • πŸ›  Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (alexey-milovidov).
    • πŸ‘‰ Fix current_user/current_address client information fields for inter-server communication (before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
    • Fix memory leak in case of some Exception during query processing with optimize_aggregation_in_order=1. #34234 (Azat Khuzhin).
    • πŸ›  Fix metric Query, which shows the number of executing queries. In last several releases it was always 0. #34224 (Anton Popov).
    • πŸ›  Fix schema inference for table runction s3. #34186 (Kruglov Pavel).
    • πŸ›  Fix rare and benign race condition in HDFS, S3 and URL storage engines which can lead to additional connections. #34172 (alesapin).
    • πŸ›  Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3 (virtual filesystem over s3 is an experimental feature that is not ready for production). #34139 (alesapin).
    • πŸ›  Fix inserts to distributed tables in case of a change of native protocol. The last change was in the version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
    • πŸ›  Fix possible data race in File table engine that was introduced in #33960. Closes #34111. #34113 (Kruglov Pavel).
    • πŸ›  Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (tavplubix).
    • πŸ›  Fix asynchronous inserts with Native format. #34068 (Anton Popov).
    • Fix bug which lead to inability for server to start when both replicated access storage and keeper (embedded in clickhouse-server) are used. Introduced two settings for keeper socket timeout instead of settings from default user: keeper_server.socket_receive_timeout_sec and keeper_server.socket_send_timeout_sec. Fixes #33973. #33988 (alesapin).
    • πŸ›  Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
    • πŸ›  Fix parsing IPv6 from query parameter (prepared statements) and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
    • πŸ›  Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
    • πŸ›  Fix usage of functions array and tuple with literal arguments in distributed queries. Previously it could lead to Not found columns exception. #33938 (Anton Popov).
    • Aggregate function combinator -If did not correctly process Nullable filter argument. This closes #27073. #33920 (alexey-milovidov).
    • πŸ›  Fix potential race condition when doing remote disk read (virtual filesystem over s3 is an experimental feature that is not ready for production). #33912 (Amos Bird).
    • πŸ›  Fix crash if SQL UDF is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
    • Fix usage of sparse columns (which can be enabled by experimental setting ratio_of_defaults_for_sparse_serialization). #33849 (Anton Popov).
    • πŸ›  Fixed replica is not readonly logical error on SYSTEM RESTORE REPLICA query when replica is actually readonly. Fixes #33806. #33847 (tavplubix).
    • πŸ›  Fix memory leak in clickhouse-keeper in case of compression is used (default). #33840 (Azat Khuzhin).
    • πŸ›  Fix index analysis with no common types available. #33833 (Amos Bird).
    • πŸ›  Fix schema inference for JSONEachRow and JSONCompactEachRow. #33830 (Kruglov Pavel).
    • πŸ›  Fix usage of external dictionaries with redis source and large number of keys. #33804 (Anton Popov).
    • πŸ›  Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
    • πŸ›  Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
    • πŸ›  Fix bug of check table when creating data part with wide format and projection. #33774 (ζŽζ‰¬).
    • Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
    • πŸ‘» Throw exception when directory listing request has failed in storage HDFS. #33724 (LiuNeng).
    • πŸ›  Fix mutation when table contains projections. This fixes #33010. This fixes #33275. #33679 (Amos Bird).
    • Correctly determine current database if CREATE TEMPORARY TABLE AS SELECT is queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (alexey-milovidov).
    • πŸ‘ Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (alexey-milovidov).
    • πŸ›  Fix bug in "zero copy replication" (a feature that is under development and should not be used in production) which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
    • Fix Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform (in case of optimize_aggregation_in_order = 1). #33637 (Azat Khuzhin).
    • πŸ›  Fix error Bad cast from type ... to DB::DataTypeArray which may happen when table has Nested column with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko).
    • πŸ›  Export into lz4 files has been fixed. Closes #31421. #31862 (Kruglov Pavel).
    • Fix potential crash if group_by_overflow_mode was set to any (approximate GROUP BY) and aggregation was performed by single column of type LowCardinality. #34506 (DR).
    • πŸ›  Fix inserting to temporary tables via gRPC client-server protocol. Fixes #34347, issue #2. #34364 (Vitaly Baranov).
    • πŸ›  Fix issue #19429. #34225 (Vitaly Baranov).
    • πŸ›  Fix issue #18206. #33977 (Vitaly Baranov).
    • βœ… This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). #33574 (Vitaly Baranov).