DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Migrating From ClickHouse to Apache Doris: What Happened?
  • Recover Distributed Transactions in MySQL
  • Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide
  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples

Trending

  • Harmonizing AI: Crafting Personalized Song Suggestions
  • Deploying to Heroku With GitLab CI/CD
  • C4 PlantUML: Effortless Software Documentation
  • Code Complexity in Practice
  1. DZone
  2. Data Engineering
  3. Databases
  4. ClickHouse Made Easy: Getting Started With a Few Clicks

ClickHouse Made Easy: Getting Started With a Few Clicks

By 
Shamim Bhuiyan user avatar
Shamim Bhuiyan
·
Apr. 07, 20 · Tutorial
Like (4)
Save
Tweet
Share
15.9K Views

Join the DZone community and get the full member experience.

Join For Free

The Yandex ClickHouse is a fast, column-oriented DBMS for data analysis. This open-source database management system is fully fault-tolerant and linearly scalable. Instead of other NoSQL DBMS, the ClickHouse database provides SQL for data analysis in real-time.

If you do not know what a column-oriented database is, don't worry. The ClickHouse team provides a very nice overview of a column-oriented DBMS. Please spend a few minutes to read the overview part of the ClickHouse documentation.

In this blog post, I will walk you through the installation of a fresh copy of the ClickHouse database, load a few sample datasets into it, and query the loaded datasets. I strongly guess that this short post will help any developer to save several hours and give a straight foreword guideline to start with a new OLAP database. The post is based on the ClickHouse documentation. If you have any confusion, you can refer to the original documentation for further information.

To keep the entire process simple and easy, I am going to use the ClickHouse Docker image to run inside a Docker container.

Step 1 

You first need to make sure Docker is installed and properly configured (proper proxy is configured if you are working under a corporate firewall) in your host operating system.

Step 2 

We are going to use the host OS file system volume for the ClickHouse data storage. All the datasets, along with meta-data, will be stored into this directory. So, create a directory somewhere in your host OS. In my case, it's called clickhouse_db_vol . 

Shell
 
xxxxxxxxxx
1
 
1
mkdir $HOME/clickhouse_db_vol


Step 3

Download and start an instance of the ClickHouse DB. Use the following shell command in your favorite terminal. 

Shell
xxxxxxxxxx
1
 
1
docker run -d -p 8123:8123 --name some-clickhouse-server --ulimit nofile=262144:262144 --volume=$HOME/clickhouse_db_vol:/var/lib/clickhouse yandex/clickhouse-server


The above command will download a Docker image from the Hub and start an instance of the ClickHouse DB. We also exposed the port 8123 so that we can use the ClickHouse REST interface to interact with it. Here, the first 8123 is the local OS port, and the second 8123 is the container port. You should have a similar output into the terminal after up and running the ClickHouse docker container as shown below.    

Output of running Docker container

Output of running Docker container

Step 4 

Now, when the ClickHouse database is up and running, we can create tables, import data, and do some data analysis ;-). To work with the database, ClickHouse provides a few interfaces and tools. For simplicity, we are going to use the HTTP interface and the ClickHouse native client. 

Step 5

Start a native client instance on Docker. Execute the following shell command.

Shell
xxxxxxxxxx
1
 
1
docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server


A new client should be run and connect to the server through the port 9000. 

New client running on port 9000

New client running on port 9000

So far so good. At this moment, we are almost ready to execute DDL and DML queries. 

Step 6

Now, create a database. A database is a logical group of tables in ClickHouse DB. Execute the following SQL query into the ClickHouse native client terminal.

SQL
xxxxxxxxxx
1
 
1
CREATE DATABASE IF NOT EXISTS tutorial


Next, run the following DDL query.

SQL
xxxxxxxxxx
1
 
1
show databases


The above SQL query will display all the existing databases into the DB.

Plain Text
xxxxxxxxxx
1
 
1
SHOW DATABASES
2
┌─name─────┐
3
│ default  │
4
│ system   │
5
│ tutorial │
6
└──────────┘
7
4 rows in set. Elapsed: 0.006 sec.


Step 7 

As we mentioned earlier, we are trying to keep things simple. So, create a table with three columns using the following query.

SQL
xxxxxxxxxx
1
 
1
CREATE TABLE test(a String, b UInt8, c FixedString(1)) ENGINE = Log'


Then, insert some data. Use the following DML statements for inserting data into the table 'TEST'.

SQL
xxxxxxxxxx
1
 
1
INSERT INTO tutorial.test (a,b,c) values ('user_1',1,'1');
2
INSERT INTO tutorial.test (a,b,c) values ('user_2',2,'5');
3
INSERT INTO tutorial.test (a,b,c) values ('user_3',3,'5');
4
INSERT INTO tutorial.test (a,b,c) values ('user_1',1,'5');
5
INSERT INTO tutorial.test (a,b,c) values ('user_4',4,'5');
6
INSERT INTO tutorial.test (a,b,c) values ('user_5',5,'5');


Step 8

Now, query the table, TEST. Execute the following query:

SQL
xxxxxxxxxx
1
 
1
Select count(*) from tutorial.test;


The above query should return the following output.

Plain Text
xxxxxxxxxx
1
 
1
SELECT count(*)
2
FROM tutorial.test
3
┌─count()─┐
4
│       6 │
5
└─────────┘
6
1 rows in set. Elapsed: 0.017 sec.


You may wonder why I didn't execute the COMMIT statement above. This is because ClickHouse DB is not a transactional database and doesn't support any types of transactions.

Step 9

At these moments, you can also use any REST tools, such a Postman to interact with the ClickHouse DB. Use your favorite REST API testing tool and send the following HTTP request.

Shell
xxxxxxxxxx
1
 
1
http://localhost:8123/?query=select * from tutorial.test where a='user_1'


Here, we send an HTTP GET request with a SQL query, which should return output as shown below.

Output from GET request

Output from GET request

So far, everything is very simple. Let's do something complex. We are going to load some huge datasets into the database and run some analytical queries against the data. 

Step 10

Download the sample dataset from the resource. The resource contains prepared partitions for direct loading into the ClickHouse DB. The dataset contains Airplane performance history at the USA airport. The size of the datasets are about 15 Gb and contains 183 millions of rows.

Step 11

Unarchive the file into the ClickHouse data directory that we created earlier.

Shell
xxxxxxxxxx
1
 
1
tar xvf ontime.tar -C $HOME/clickhouse_db_vol


Step 12 

The unarchiving process will take a few minutes to complete. After completing the process, you should find a similar directory structure in your ClickHouse data directory.

Current file structure

Current file structure

Step 13

Here, "datasets" is the name of the database created into the ClickHouse. And "ontime" is the name of the table. Now, restart the Docker container and wait for a few minutes for ClickHouse to create the database and tables and load the data into the tables. Usually, it takes a couple of minutes.

Step 14

Execute the SQL statement, show database, to ensure that the database named "datasets" is already created.

Plain Text
xxxxxxxxxx
1
 
1
SHOW DATABASES
2
┌─name─────┐
3
│ datasets │
4
│ default  │
5
│ system   │
6
│ tutorial │
7
└──────────┘
8
4 rows in set. Elapsed: 0.005 sec.


Step 15

Now, we are ready to query the database. Execute the following query.

SQL
xxxxxxxxxx
1
 
1
SELECT DestCityName, uniqExact(OriginCityName) AS u
2
FROM datasets.ontime
3
WHERE Year >= 2000 and Year <= 2010
4
GROUP BY DestCityName
5
ORDER BY u DESC LIMIT 10;


The query will return the most popular destinations by the number of directly connected cities for various year ranges.

Plain Text
xxxxxxxxxx
1
13
 
1
┌─DestCityName──────────┬───u─┐
2
│ Atlanta, GA           │ 193 │
3
│ Chicago, IL           │ 167 │
4
│ Dallas/Fort Worth, TX │ 161 │
5
│ Minneapolis, MN       │ 138 │
6
│ Cincinnati, OH        │ 138 │
7
│ Detroit, MI           │ 130 │
8
│ Houston, TX           │ 129 │
9
│ Denver, CO            │ 127 │
10
│ Salt Lake City, UT    │ 119 │
11
│ New York, NY          │ 115 │
12
└───────────────────────┴─────┘
13
10 rows in set. Elapsed: 7.373 sec. Processed 72.81 million rows, 3.37 GB (9.88 million rows/s., 457.24 MB/s.)


The SQL query takes 7.3 seconds to execute. Let's run a few more queries as shown below.

SQL
xxxxxxxxxx
1
 
1
SELECT Year, avg(DepDelay>10)*100
2
FROM datasets.ontime
3
GROUP BY Year
4
ORDER BY Year;


It returns a percentage of flights delayed for more than 10 minutes, by year. 

Conclusion

In this short post, we described how to run the ClickHouse database into Docker and load some huge datasets to do some data analysis. From here, you can start loading sample datasets and keep exploring the ClickHouse database features.

Database ClickHouse sql

Opinions expressed by DZone contributors are their own.

Related

  • Migrating From ClickHouse to Apache Doris: What Happened?
  • Recover Distributed Transactions in MySQL
  • Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide
  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: