Hive creates folders like HIVE_UNION_SUBDIR_1

When using Tez, Inserting data using SELECT / UNION ALL will create non standard folders such as HIVE_UNION_SUBDIR_1.. n under the Table location folder.

According to Apache,

Each leg of the union places data into a subdir of the table/partition. Subdirs are named 1/, 2/, etc

  • The way this currently works is that CTAS creates an Acid table but the insert statement writes the data in non-acid layout. Then on read, it’s treated like an non-acid to acid conversion.
  • Longer term CTAS should create acid layout from the get-go. Each leg of the union places data into a subdir of the table/partition.
  • Subdirs are named HIVE_UNION_SUBDIR_1/, HIVE_UNION_SUBDIR_2/, etc

There are two possible solutions

  1. If you have the flexibility to use Map Reduce, change your engine to MapReduce, and that should solve this issue.

2. If MR is not an option then perform two inserts and skip UNION.

Example: Instead of using UNION ALL

INSERT OVERWRITE TABLE  AS 
SELECT * from  table 1
UNION ALL
SELECT * from table 2

Use multiple INSERT statements

INSERT OVERWRITE TABLE  AS SELECT * from  table 1
INSERT INTO TABLE  AS SELECT * from table 2

One thought on “Hive creates folders like HIVE_UNION_SUBDIR_1

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: