ClickHouse is a column-oriented database system with SQL-like interface. Its main selling point is...
ClickHouse is a column-oriented database system with SQL-like interface. Its main selling point is high performance when handling enormous datasets in near real time. The sacrifice you'll need to make is very limited support for updates, deletes, or transactions. If your dataset is enormous, and you mostly read, aggregate, and insert, it could work for you.
Which arguably makes it more suitable as an secondary database, not as primary one.
You can use ClickHouse with ClickHouse Cloud, and that's what most documentation refers to, but for some practice, you can run it locally as well.
The easiest way is to use Docker. To run a local server execute this command:
$ docker run -d --name clickhouse-server clickhouse/clickhouse-server
Then we can connect to it:
$ docker exec -it clickhouse-server clickhouse-client
If you need multiple servers, just change clickhouse-server
accordingly.
d7ba07d71412 :) select 2 + 2
SELECT 2 + 2
Query id: 1565da74-dc33-4d1f-84ae-d5241b052b5c
┌─plus(2, 2)─┐
│ 4 │
└────────────┘
1 row in set. Elapsed: 0.001 sec.
d7ba07d71412 :) select concat('Hello', ', ', 'world!')
SELECT concat('Hello', ', ', 'world!')
Query id: 090687b1-3a8f-4300-9368-579a87c348c0
┌─concat('Hello', ', ', 'world!')─┐
│ Hello, world! │
└─────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
We don't have any data yet, but we can select from system.numbers
- that's a basically infinite virtual table of all numbers starting from 0.
d7ba07d71412 :) select if(number % 15 = 0, 'FizzBuzz', if(number % 5 = 0, 'Buzz', if(number % 3 = 0, 'Fizz', toString(number)))) as FizzBuzz from system.numbers where number >= 1 limit 100
SELECT if((number % 15) = 0, 'FizzBuzz', if((number % 5) = 0, 'Buzz', if((number % 3) = 0, 'Fizz', toString(number)))) AS FizzBuzz
FROM system.numbers
WHERE number >= 1
LIMIT 100
Query id: 8badf288-95b5-4121-b4ad-18730fbfafc1
┌─FizzBuzz─┐
│ 1 │
│ 2 │
│ Fizz │
│ 4 │
│ Buzz │
│ Fizz │
│ 7 │
│ 8 │
│ Fizz │
│ Buzz │
│ 11 │
│ Fizz │
│ 13 │
│ 14 │
│ FizzBuzz │
│ 16 │
│ 17 │
│ Fizz │
│ 19 │
│ Buzz │
│ Fizz │
│ 22 │
│ 23 │
│ Fizz │
│ Buzz │
│ 26 │
│ Fizz │
│ 28 │
│ 29 │
│ FizzBuzz │
│ 31 │
│ 32 │
│ Fizz │
│ 34 │
│ Buzz │
│ Fizz │
│ 37 │
│ 38 │
│ Fizz │
│ Buzz │
│ 41 │
│ Fizz │
│ 43 │
│ 44 │
│ FizzBuzz │
│ 46 │
│ 47 │
│ Fizz │
│ 49 │
│ Buzz │
│ Fizz │
│ 52 │
│ 53 │
│ Fizz │
│ Buzz │
│ 56 │
│ Fizz │
│ 58 │
│ 59 │
│ FizzBuzz │
│ 61 │
│ 62 │
│ Fizz │
│ 64 │
│ Buzz │
│ Fizz │
│ 67 │
│ 68 │
│ Fizz │
│ Buzz │
│ 71 │
│ Fizz │
│ 73 │
│ 74 │
│ FizzBuzz │
│ 76 │
│ 77 │
│ Fizz │
│ 79 │
│ Buzz │
│ Fizz │
│ 82 │
│ 83 │
│ Fizz │
│ Buzz │
│ 86 │
│ Fizz │
│ 88 │
│ 89 │
│ FizzBuzz │
│ 91 │
│ 92 │
│ Fizz │
│ 94 │
│ Buzz │
│ Fizz │
│ 97 │
│ 98 │
│ Fizz │
│ Buzz │
└──────────┘
100 rows in set. Elapsed: 0.002 sec. Processed 65.41 thousand rows, 523.27 KB (31.21 million rows/s., 249.67 MB/s.)
d7ba07d71412 :)
Same notes:
system.numbers
is implicitly ordered, so system.numbers where number >= 1 limit 100
will return numbers from 1 to 100if(number % 3 = 0, 'Fizz', toString(number))
, SQL databases tend to do a lot of automatic type conversions, ClickHouse less soWe won't be coding from REPL, so let's write some code. There's no official Ruby driver, but there's a Python one. And some unofficial Ruby drivers.
$ pip3 install clickhouse-connect
We also need to expose our Docker container's Clickhouse port. First let's remove the old one (we can adjust it on running container, but it's more complex and we don't have any data there anyway):
$ docker stop clickhouse-server
$ docker rm clickhouse-server
And start it with port 8123 mapped:
$ docker run -p 8123:8123 -d --name clickhouse-server clickhouse/clickhouse-server
Now we can write some Python code to talk to our database:
#!/usr/bin/env python3
import clickhouse_connect
client = clickhouse_connect.get_client()
query = """
select
if(number % 15 = 0, 'FizzBuzz',
if(number % 5 = 0, 'Buzz',
if(number % 3 = 0, 'Fizz',
toString(number))))
as FizzBuzz
from system.numbers
where number >= 1
limit 100
"""
print(client.command(query))
The result of client.command(query)
is a multiline string, so it's mostly useful for debugging.
There are other methods that return more obvious data structures like list of tuples, but also Pandas dataframe and so on.
So we got ClickHouse running. In a future post, we'll make it work with some real data.