The performance level of your queries in Vertica depend on the design of your projections. In the projection definitions you specify how your data will be segmented and how your data will be sorted. If you are going to make the right choices your queries will perform really well. I covered projection segmentation in my previous post “Building projections with optimal segmentation in HP Vertica”. In this post we will examine sort order optimization.
What tools I can use to design sort order
Standard Vertica install has a tool called “Database Designer”. When Vertica is installed Database Designer is also installed and available on every node in the cluster. Database Designer can be run in many ways: a) from administration tools; b) from management console; c) using Database Designer functions. The most common way to start Database Designer is by using administration tools. I will use this method throughout this post.
What do I need to have if I want to get sort order recommendations from Database Designer?
You need:
- a running Vertica cluster;
- data loaded into tables you will be working on. I am going to use the same dataset I used for the post “Optimizing storage footprint in HP Vertica with Database Designer”. If you need guidance on loading your data you can refer to that post. I have a section on loading data to Vertica;
- SQL queries you will be executing as a normal workload. Database Designer will be using your SQL queries to come up with most optimal sort order.
How do I prepare a file with SQL queries for Database Designer
The most important thing in this exercise is to have a good set of SQL queries. Create this set by taking SQL queries you are planning to execute in your Vertica cluster. You need to have a good number of queries. An optimal number of queries in the file is 100. Database Designer does not have a set limit on the number of queries, but it is better not to exceed 100 significantly. The query set you are creating should represent each class of queries you will be running. You can also put several variations of queries in each query class.
Put your queries in a text file and make sure to delimit each query from the next one by semicolon (;). To make the file easier to work with I will usually add an empty line after each query. Once the file is ready, upload it to the node you will be running adminTools on. Once the file is on the node make sure that Vertica process will be able to read it. Vertica generally runs under dbadmin user.
Running Database Designer
I will work in the /tmp/sortorder directory. Currently it only has the file with my SQL queries
To run the Database Designer you need to start Vertica’s admintools.
- ssh to one of the nodes of your Vertica cluster
- run sudo su – dbadmin
- run admintools
You will see Administration Tools Main Menu
Once you are back in adminTools navigate out of “Configuration Menu” by hitting “Main Menu” and close adminTools by hitting the “Exit” menu item.
Analyzing Database Designer output
After running Database Designer we got a lot of additional files in our working directory.
If we will examine the contents of the working directory we will find the following categories of files
a) Backup of schema and objects before Database Designer execution [ catalog_dump.sql and schema_dump.xml ];
b) Files logging inputs provided by you in Database Designer screens [ design_config_name.txt and sortorder_params.txt ];
c) Detailed log of Database Designer execution [ designer.log ];
d) Schema design recommended by Database Designer based on your inputs [ sortorder_design.sql and sortorder_deploy.sql ].
The files we are most interested in are sortorder_design.sql and sortorder_deploy.sql. Both files have the prefix sortorder, as Database Designer uses the string we specified for design name as the prefix for these file names.
sortorder_design.sql: file contains SQL script to create the recommended objects from scratch. You can use it if all you need is to add newly recommended objects.
sortorder_deploy.sql: file contains SQL script to modify existing schema and its objects. You can use it if you accept the newly recommended structure and properties and want to take the existing schema to this new recommended design. If you would check “Deploy design” on “Design Options” screen Database Designer will run this script for you at the end of its run. In most cases the script will create new projections, move data and drop existing projections.
Examining Database Designer Recommended Sort Order
In the beginning of this exercise I created a new table and Vertica auto created a superprojection when I loaded the data. Those superprojections had no sort order at all.
When I open sortorder_design.sql file I see following SQL:
CREATE PROJECTION sortorder.ORDER_FACTS_DBD_1_rep_sortorder /*+createtype(D)*/ ( ... ) AS SELECT CLIENT_ID, ... FROM sortorder.ORDER_FACTS ORDER BY SOURCESYSTEM_CD, CLIENT_ID, MANUFACTURER_ID, QUANTITY, SHIPPED_DATE, PRODUCT_CD, ORDER_DATE, ORDER_TOTAL ...
Based on the queries I provided to Database Designer it recommends the sort order listed above to make my queries perform better. You can see ORDER BY clause containing table columns. When Vertica will be loading data into this projection it will sort it before writing to the disk. At the time you will be running SQL queries optimizer will use the Vertica catalog information about existing projections and sort orders to come up with the most efficient execution plan.
Perfect piece of work you have done, this website is really cool with great info.