Creating a Parquet duplicate of an existing Impala table

If you have an Impala table saved as a comma separated file and would like to get a speed improvement while performing analytical queries, one of the easiest aids to reach is converting the table into Parquet, the optimized columnar store format for Impala and Hive. Doing this is really simple:

impala-shell> create table facts_parquet like facts stored as parquet;
impala-shell> insert into facts_parquet select * from facts;

The “stored as parquet” clause is all you have to add.

Parquet is compressed, fitting much better in RAM. And Parquet is columnar, allowing you to load only the columns your query uses, without accessing the others. That’s where the big improvement comes from. Now substitute the facts_parquet table to facts in your queries and enjoy your data accelerating like electrons in a collider!