Top 100 Hive Interview Questions And Answers

It is an aid to know what type of questions are commonly asked during a Hive interview. Looking forward to becoming a Hadoop Developer? Here are some Hive Interview questions. Let’s hope these top 100 hive interview questions and answers prepare you for your next Hadoop interview. Below is the list of Apache Hive interview questions that have been framed for Apache Hive brainstormers. 

Top Hive Interview Questions and Answers

Table of Contents

1. What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive? 

Hive is not considered a full database. The design rules and regulations of Hadoop and HDFS have put restrictions on what Hive can do. However, Hive is most suitable for data structures in warehouse applications because it:

  • Analyzes relatively static data.
  • Has less responsive time.
  • does not make rapid changes in data.

Although Hive doesn’t provide fundamental features required for Online Transaction Processing (OLTP), it is suitable for a data warehouse framework in large datasets.

There are two types of tables data present in Hive:

  • Managed tables
  • External tables

2. Explain the SMB Join in Hive? 

In SMB join in Hive, every mapper pursues a bucket from the first table and the relating bucket from the second table, and after that, a merge sort join is performed. Sort Merge Bucket (SMB) joins in the hive are for the most utilized as there are no restrictions on file or segment or table join. 

SMB join can best be utilized when the table data stored is huge. In SMB join the sections are bucketed and arranged to utilize the join segments. The table directory ought to have a similar number of buckets in SMB join.

3. How is HIVE different from RDBMS? 

  • RDBMS supports schema on Write whereas Hive provides schema on Read.
  • In Hive, we can write once but in RDBMS we can write as many times as we want.
  • Hive can handle big datasets whereas RDBMS can’t handle beyond 10TB.
  • Hive is highly scalable but scalability in RDBMS costs a loss.
  • Hive has a feature of Bucketing which is not there in RDBMS.

4. What types of databases does Hive support ? 

For single-user metadata storage, the Hive database system uses a derby database, and for multiple user Metadata or shared Metadata case Hive uses MYSQL.

5. In Apache Hive, how can you enable buckets? 

In Hive, you can enable buckets by using the following command: set.hive.enforce.bucketing=true;

See also  Top 65 Apache Kafka Interview Questions And Answers

6. Is Apache Hive suitable to be used for OLTP systems? Why? 

No, it is not suitable for the OLTP system since it does not offer insert and updates at the row level.

7. What is the Object Inspector functionality in Apache Hive? 

In Hive the analysis of the inner structure of the segments, columns, and complex items are finished utilizing Object Inspector functionality. Question Inspector functionality makes available to the inner fields, which are present inside the objects.

8. What are the limitations of Apache Hive? 

  • We can not perform real-time queries with Hive. 
  •  It does not offer row-level updates.
  • For interactive data browsing Hive offers acceptable latency.
  • Hive is not the right choice for online transaction processing.

9. What are the different Modes in the Apache Hive? 

Sometimes in hive interview questions, interviewers like to ask these basic questions to see how confident you are when it comes to your Hive knowledge. Answer by saying that Hive can sometimes operate in two modes, which are MapReduce mode and local mode. Explain that this depends on the size of the DataNodes in Hadoop.

10. What is Hive Bucketing? 

When performing queries on large datasets in an apache hive, bucketing can offer better structure to Hive tables. For example, bucketing can give programmers more flexibility when it comes to record-keeping and can make it easier to debug large datasets when needed.

Top Hive Interview Questions and Answers

11. What is the difference between partition and bucketing? 

The main aim of both Partitioning and Bucketing is to execute the query more efficiently. When you are creating a table data present the slices are fixed in the partitioning the table.

12. Where does the data of a  table get stored in Hive? 

In an HDFS directory – /user/hive/warehouse, the table data is stored, by default only. Moreover, by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml, one can change it.

13. How data transfer happens from HDFS to Hive? 

If data is already present in HDFS then the user need not LOAD DATA that moves the files to the /user/hive/warehouse/. So the user simply has to define the table using the keyword external that creates the table definition in the hive metastore.

14. What does the Hive query processor do? 

Hive query processor converts graphs of MapReduce jobs with the execution time framework so that the jobs can be executed in the order of dependencies. 

15. Explain about SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY in Hive. 

SORT BY – Data is ordered at each of ‘N’ reducers where the reducers can have an overlapping range of data. 

ORDER BY- This is similar to the ORDER BY in SQL queries where the total ordering of data takes place by passing it to a single reducer.

DISTRIBUTE BY – It is used to distribute the rows among the reducers. Rows that have the same distribution by columns will go to the same reducer.

CLUSTER BY- It is a combination of DISTRIBUTE BY and SORT BY where each of the N reducers gets a non-overlapping range of data which is then sorted by those ranges at the respective reducers.

16. What is the difference between local and remote metastore? 

Local Metastore: It is the embedded metastore configuration service that runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM. 

Remote Metastore: In this configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM.

17. Which classes are used in Hive to Read and Write HDFS Files? 

Following classes are used by Hive to read and write HDFS files:

  • TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text file format.
  • SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in hadoop SequenceFile format.

18. Explain the functionality of ObjectInspector. 

ObjectInspector helps analyze the internal structure of a row object and the individual structure of columns in Hive. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.

  • An instance of Java class.
  • A standard Java object.
  • A lazily initialized object

ObjectInspector tells the structure of the object and also the ways to access the internal fields inside the object.

19. What is ObjectInspector functionality in Hive? 

Hive uses ObjectInspector to analyze the internal structure of the rows and column data type. Additionally gives us ways to access the internal fields inside the object. It not only processes common data-types like int, bigint, STRING, but also processes complex data-types like arrays, maps, structs and union.

20. How does bucketing help in the faster execution of queries? 

These types of hive interview questions and answers are to make the candidate familiar with the essential hive supports, hive query language, and hive index. 

If you have to join two large tables, you can go for a reduced side join. But if both the tables have the same number of buckets or same multiples of buckets and are also sorted on the same column there is a possibility of SMBMJ in which all the joins take place in the map phase itself by matching the corresponding buckets.

There are different properties which you need to set for bucket map joins and they are as follows:

set hive.enforce.sort merge bucket map join = false; set hive.auto.convert.sortmerge.join = false; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true

Top Hive Interview Questions and Answers

21. Why will mapreduce not run if you run select * from a table in a hive? 

When you perform a “select * from  “, Hive fetches the whole data from the file as a FetchTask rather than a mapreduce task which just dumps the data as it is without doing anything on it. This is similar to “hadoop dfs -text  “. However, while using “select from  “, Hive requires a map-reduce job since it needs to extract the ‘column’ from each row by parsing it from the file it loads.

22. What is Hive MetaStore? 

Hive MetaStore is a central repository of Hive, that allows storing metadata in external databases. By default location of Hive store is metadata in an embedded derby database instance, but you can store it in MySql, Oracle depends on the project.

23. What are the three different modes in which hives can be run? 

  • Local mode
  • Distributed mode
  • Pseudo Distributed mode

24. How can you prevent a large job from running for a long time? 

This can be achieved by setting the MapReduce jobs to execute in strict mode set hive.mapred.mode=strict. The strict mode ensures that the queries on particular partition tables, cannot execute without defining a WHERE clause.

25. When do we use explode in Hive? 

Hadoop developers sometimes take an array as input and convert into a separate table row. To convert complex data types into desired table formats, Hive uses explode.

26. What are the different components of a Hive architecture? 

Hive Architecture consists of a :

  • User Interface – UI component of the Hive architecture calls the execute interface to the driver.
  • Driver – Driver creates a session handle to the query and sends the query to the compiler to generate an execution plan for it.
  • Metastore – Sends the metadata to the compiler for the execution of the query on receiving the sendMetaData request.
  • Compiler- Compiler generates the execution plan which is a DAG of stages where each stage is either a metadata operation, a map or reduce job or an operation on HDFS.
  • Execute Engine- Execution engine is responsible for submitting each of these stages to the relevant components by managing the dependencies between the various stages in the execution plan generated by the compiler.

27. How can you connect an application, if you run Hive as a server? 

When running Hive as a server, the application can be connected in one of the 3 ways-

  • ODBC Driver-This supports the ODBC protocol
  • JDBC Driver- This supports the JDBC protocol
  • Thrift Client- This client can be used to make calls to all hive commands using different programming languages like PHP, Python, Java, C++ and Ruby.

28. Can we LOAD data into a view? 

No. A view can not be the target of an INSERT or LOAD statement.

29. Is it possible to add 100 nodes when we already have 100 nodes in Hive? If yes, how? 

Yes, we can add the nodes by following the below steps:

  • Step 1: Take a new system; create a new username and password
  • Step 2: Install SSH and with the master node setup SSH connections
  • Step 3: Add ssh public_rsa id key to the authorized_keys file
  • Step 4: Add the new DataNode hostname, IP address, and other details in /etc/hosts slaves file: 192.168.1.102 slave3.in slave3
  • Step 5: Start the DataNode on a new node
  • Step 6: Login to the new node like suhadoop or: ssh -X [email protected]
  • Step 7: Start HDFS of the newly added slave node by using the following command: ./bin/hadoop-daemon.sh start data node
  • Step 8: Check the output of the jps command on the new node

30. Can Hive process any type of data formats? 

This is one of the common hive interview questions. 

Yes, Hive uses the SerDe interface for IO operations. Different SerDe interfaces can read and write any type of data. If normal directly processes the data whereas different types of data are in the Hadoop, Hive uses a different SerDe interface to process such data.

See also  Top 100 Jenkins Interview Questions and Answers

31. How can you stop a partition from being queried? 

You can stop a partition from being queried by using the ENABLE OFFLINE clause with ALTER TABLE statement.

32. What is a Hive variable? What do we use it for? 

Hive variables are basically created in the Hive environment that is referenced by Hive scripting languages. They allow passing some values to a Hive query when the query starts executing. They use the source command.

33. What is SerDe in Apache Hive? 

A SerDe is a short name for a Serializer Deserializer. Hive uses SerDe to read and write data from hive external tables. An important concept behind Hive is that it DOES NOT own the Hadoop File System format that data is stored in. Users are able to write files to HDFS with whatever tools/mechanism takes their fancy(“CREATE EXTERNAL TABLE” or “LOAD DATA INPATH,” ) and use Hive to correctly “parse” that file format in a way that can be used by Hive. 

34. Whenever we run a Hive query, a new metastore_db is created. Why? 

A local metastore is created when we run Hive in an embedded mode. Before creating, it checks whether the metastore exists or not, and this metastore property is defined in the configuration file, hive-site.xml. 

The property is: javax.jdo.option.ConnectionURL with the default value: jdbc:derby:;databaseName=metastore_db;create=true. 

Therefore, we have to change the behavior of the location to an absolute path so that from that location the metastore can be used.

35. Can we change the default location data type of a column in a hive table? 

Using REPLACE column option: ALTER TABLE table_name REPLACE COLUMNS

36. Why does Hive not store metadata information in HDFS? 

Hive stores metadata information in the metastore using RDBMS instead of HDFS. The main reason for choosing RDBMS is to achieve low latency because HDFS read/write operations are time-consuming processes.

37. How does Hive deserialize and serialize the data? 

Usually, while reading/writing the data, the user first communicates with inputformat. Then it connects with the Record reader to read/write records. To serialize the data, the data goes to a row. Here deserialized custom use object inspector to deserialize the data in fields.

38. What is RegexSerDe? 

Regex stands for a regular expression. Whenever you want to have a kind of pattern matching, based on the pattern matching, you have to store the fields. RegexSerDe is present in org.apache.hadoop.hive.contrib.serde2.RegexSerDe.

In the SerDeproperties, you have to define your input pattern and output fields. For example, you have to get the column values from line xyz/[email protected] if you want to take xyz, pq and def separately.

39. While loading data into a hive table using the LOAD DATA clause, how do you specify it is a hdfs file and not a local file ? 

By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

40. Explain about the different types of partitioning in Hive? 

Partitioning in Hive helps prune the data when executing the queries to speed up processing. In static partitions, the name of the partition is hardcoded into the insert statement whereas, in a dynamic partition, Hive automatically identifies the partition based on the value of the partition field.

Top Hive Interview Questions and Answers

41. What is the significance of the ‘IF EXISTS” clause while dropping a table? 

When we issue the command DROP TABLE IF EXISTS table_name, Hive throws an error if the table being dropped does not exist in the first place.

42. How can Hive avoid mapreduce? 

If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.

43. What is the relationship between MapReduce and Hive? or How Mapreduce jobs submit to the cluster? 

Hive provides no additional capabilities to MapReduce. The programs are executed as MapReduce jobs via the interpreter. The Interpreter runs on a client machine that turns HiveQL queries into MapReduce jobs. Framework submits those jobs onto the cluster.

44. What is ObjectInspector functionality? 

Hive uses ObjectInspector to analyze the internal structure of the row object and also the structure of the individual columns. ObjectInspector provides a uniform way to access complex objects that can be stored in multiple formats in the memory, including:

Instance of a Java class (Thrift or native Java):

  • A standard Java object (we use java.util.List to represent Struct and Array, and use java.util.Map to represent Map).
  • A lazily-initialized object (For example, a Struct of string fields stored in a single Java string object with starting offset for each field) A complex object can be represented by a pair of ObjectInspector and Java Object. The ObjectInspector not only tells us the structure of the Object, but also gives us ways to access the internal fields inside the Object.

45. Suppose that I want to monitor all the open and aborted transactions in the system along with the transaction id and the transaction state. Can this be achieved using Apache Hive? 

Hive 0.13.0 and above version support SHOW TRANSACTIONS command that helps administrators monitor various hive transactions.

46. Can a partition be archived? What are the advantages and disadvantages? 

Yes. A partition can be archived. The advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient queries and does not offer any space savings.

47. Does the archiving of Hive tables save space in HDFS? 

No. It only reduces the number of files which becomes easier for the name node to manage.

48. Does Hive support record level Insert, delete or update? 

Hive does not provide record-level updates, insert, or delete. Henceforth, Hive does not provide transactions either. However, users can go with CASE statements and built-in functions of Hive to satisfy the above DML operations. Thus, a complex update query in an RDBMS may need many lines of code in Hive.

49. What are the default record and field delimiter used for hive text files? 

This is one of the rare hive interview questions. The default record delimiter is − \n. And the field delimiters are − \001,\002,\003.

50. What is the difference between static and dynamic partition of a table? 

This type of hive interview questions suites for freshers.

To prune data during a query, a partition can minimize the query time. The partition is created when the data is inserted into the table. A static partition can insert individual rows whereas a dynamic partition can process the entire table based on a particular column. At least one static partition is required to create any (static, dynamic) partition. If you are partitioning a large dataset, doing sort of an ETL flow Dynamic partition is recommended.

51. Why do we perform partitioning in Hive? 

In a Hive table, Partitioning provides granularity. Hence, by scanning only relevant partitioned data instead of the whole dataset it reduces the query latency.

52. How does partitioning help in the faster execution of queries? 

With the help of partitioning, a subdirectory will be created with the name of the partitioned column and when you perform a query using the WHERE clause, only the particular subdirectory will be scanned instead of scanning the whole table. This gives you faster execution of queries.

53. Can you list a few commonly used Hive services? 

  • Command Line Interface (cli)
  • Hive Web Interface (hwi)
  • HiveServer (hiveserver)
  • Printing the contents of an RC file using the tool rcfilecat.
  • Jar
  • Metastore

54. What is the default maximum dynamic partition that can be created by a mapper/reducer? How can you change it? 

By default, the number of maximum partitions that can be created by a mapper or reducer is set to 100. One can change it by issuing the following command:

SET hive.exec.max.dynamic.partitions.pernode = <value>.

55. Why do we need buckets? 

Basically, for performing bucketing to a partition there are two main reasons:

  • A map side join requires the data belonging to a unique join key to be present in the same partition.
  • It allows us to decrease the query time. Also, makes the sampling process more efficient.

56. Can we name views the same as the name of a Hive table? 

No. The name of a view must be unique compared to all other tables and as views present in the same database.

57. What Options are Available When It Comes to Attaching Applications to the Hive Server? 

Explain the three different ways (Thrift Client, JDBC Driver, and ODBC Driver) you can connect applications to the Hive Server. You’ll also want to explain the purpose for each option: for example, using JDBC will support the JDBC protocol.

58. When should we use SORT BY instead of ORDER BY? 

Despite ORDER BY we should use SORT BY. Especially while we have to sort huge datasets. The reason is that the SORT BY clause sorts the data using multiple reducers. ORDER BY sorts all of the data together using a single reducer. Hence, using ORDER BY will take a lot of time to execute a large number of inputs.

59. What are the uses of Hive Explode? 

Hadoop Developers consider an array as their input and convert it into a separate table row. To convert complicated data types into desired table formats, Hive uses Explode.

60. Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? If yes, how? 

These types of hive interview questions and answers are of moderate importance.  Yes, we can run UNIX shell commands from Hive using an ‘!‘ mark before the command. For example, !pwd at Hive prompt will display the current directory. We can execute Hive queries from the script files using the source command.

See also  Top 80 Microservices Interview Questions and Answers

Top Hive Interview Questions and Answers

61. How is ORC file format optimised for data storage and analysis? 

ORC stores collections of rows in one file and within the collection the row data will be stored in a columnar format. With columnar format, it is very easy to compress, thus reducing a lot of storage costs. While querying also, it queries the particular column instead of querying the whole row as the records are stored in columnar format. ORC has indexed on every block based on the statistics min, max, sum, count on columns so when you query, it will skip the blocks based on the indexing.

62. What is the difference between Internal and External Tables? 

  • Internal table: MetaStore and actual data both stored in the local system. In any situation, data lost, both actual data and meta storage will be lost.
  • External table: Schema is stored in Database. Actual data stored in Hive tables. If data is lost in the External table, it loses only metastore, but not actual data.

63. Explain the different types of joins in Hive. 

HiveQL has 4 different types of joins –

  • JOIN- Similar to Outer Join in SQL
  • FULL OUTER JOIN – Combines the records of both the left and right outer tables that fulfil the join condition.
  • LEFT OUTER JOIN- All the rows from the left table are returned even if there are no matches in the right table.
  • RIGHT OUTER JOIN-All the rows from the right table are returned even if there are no matches in the left table.

64. What is a metastore in Hive? 

It is a relational database storing the metadata of hive tables, partitions, Hive databases etc.

65. What is the functionality of the Query Processor in Apache Hive? 

This component implements the processing framework for converting SQL to the graph of map/reduce jobs and the execution time framework to run those jobs in the order of dependencies.

66. What is the utilization of Hcatalog? 

Hcatalog can be utilized to share information structures with external systems. Hcatalog gives access to hive meta-store to clients of other devices on Hadoop with the goal that they can read and compose information to a hive data warehouse.

67. How will you optimize Hive performance? 

There are various ways to run Hive queries faster –

  • Using Apache Tez execution engine
  • Using vectorization
  • Using ORCFILE
  • do cost based query optimization.

68. In the case of embedded Hive, can the same metastore be used by multiple users? 

We cannot use metastore in sharing mode. It is suggested to use standalone real databases like PostGreSQL and MySQL.

69. When to use Map reduce mode? 

Map-reduce mode is used when: – It will perform on a large amount of data sets and query going to execute in a parallel way – Hadoop has multiple data nodes, and data is distributed across different node we use Hive in this mode – Processing large data sets with better performance needs to be achieved

70. What is the importance of Thrift server & client, JDBC and ODBC driver in Hive? 

Thrift is a cross-language RPC framework that generates code and combines a software stack to finally execute the Thrift code in a remote server. The thrift compiler acts as an interpreter between server and client. Thrift server allows a remote client to submit a request to Hive, using different programming languages like Python, Ruby, and Scala.

JDBC driver: A JDBC driver is a software component enabling a Java application to interact with a database.

ODBC driver: ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between the application and the DBMS.

71.  What is the default database of Hive to store its metadata? 

Derby is the default database. It is an embedded metastore of Hive. We can configure it to use any default database for metastore. 

72.  What is the default execution engine in Hive? 

Map Reduce is the default execution engine. 

73.  How to change the execution engine in Hive? 

  • SET hive.execution.engine = tez

74. Which component of Hive connects to the Hadoop cluster for query execution? 

Execution engine 

75. Which component of Hive converts SQL query to jar file for execution? 

Execution engine 

76. How do you see the execution plan for the query? 

Use explain 

hive> EXPLAIN select * from stud;
OK
Plan not optimized by CBO.

Stage-0
 Fetch Operator
 limit:-1
 Select Operator [SEL_1]
 outputColumnNames:["_col0","_col1","_col2"]
 TableScan [TS_0]
 alias:stud

77. How do you see the database in which you are currently working on? 

Set hive.cli.print.current.db = true 

hive (custdb)> set hive.cli.print.current.db=false;
hive> set hive.cli.print.current.db=true;
hive (custdb)>

78. What is the default storage location of Hive tables? 

  • /user/hive/warehouse 

79. What do you do to have a table which should exist only for that particular Hive session? 

  • Create temporary table 
hive> create temporary table t3 (coll int, col2 string); OKTime taken: 0.047 seconds 
  • Hive 0.14 onwards supports temporary tables. You can use them as a normal table within a user session. The location will be /tmp/hive/<user>/*.

80. What are the limitations of Hive temporary tables? 

  • It is available only to that session where it is created.
  • Automatically removed when session terminated.
  • Partition column option is not supported on temporary tables.
  • Creating Indexes on temporary tables is not allowed. 
  • The user can’t access the permanent table with the same name as temporary tables during that session without renaming the temporary tables. 

Top Hive Interview Questions and Answers

81. What are the different types of tables in Hive? 

  • Temporary Table
  • Managed Table
  • External Table 

82. Scenario: We need to store one of the feed data into the Hive table and expose it for consumption. What are the steps you will take? 

  • Talk to the source team and get schema information of the feed data like data type, data size, volume of data etc. 
  • Talk to the business team who is going to consume the data and find the search criteria they will be using. 
  • Based on the above two create tables with partitioning/bucketing/indexing etc. 

83. What are the default terminators in Hive? 

  • Field terminator -ctrl + A 
  • Line terminator -\n

84. How do you get the schema information of tables in Hive?

DESCRIBE FORMATTED <Table Name>

85. Can we use DESCRIBE for Database to see the details?

No, it is only for Tables. 

86. How do you load data from Linux location to Hive Table location using Hive script?

  • Load data local inpath ‘ /home/ hduser/hive/data/txns’ into table txnrecords;
  • Load data local inpath ‘/home/hduser/hive/data/txns’ overwrite into table txnrecords;

It copies the content from Linux to the Hive table location. 

87. How do you load data from HDFS location to Hive table location using Hive script?

  • Load data inpath ‘/user/hduser/txns 1’ into table txnrecords;
  • Load data inpath ‘/user/hduser/txns 1’ overwrite into table txnrecords;

It moves data from the HDFS location to the Hive table location. 

88. How do you convert a row of records into a single object in Hive?

  • Use Collect_Set() or Collect_List()
  • Collect_Set() removes duplicates

89. How do you convert an array of data into individual rows? 

  • Explode can’t use with other columns
  • Lateral view explode – can use other columns 
  • PosExplode – explode with index

90. What are the different types of indexes available in Hive? 

  • Compact Index – use in high cardinal column
  • Bitmap – use in lowcardinal column 

91. What are different types of partitions available in Hive? 

  • Static: based on some external factors
  • Load data local inpath ‘/home/hduser/hive/student_details2’ into table student partition (course = “hadoop”)
  • Dynamic : based on data 
  • Set hive.exec.dynamic.partition=true;
  • Set hive.exec.dynamic.partition.mode=nonstrict;
  • Load data local inpath ‘/home/hduser/hive/student_details’ into table stud_demo;
  • Insert into student_part partition(course) select id,name,age,institute, course from stud_demo;

92.  Where Null and Empty values will be stored in case of partitions?

  • _HIVE_DEFAULT_PARTITIONS dir.

93. Do Partition values are case sensitive? 

Yes

94. Can we do dynamic partition using LOAD DATA command?

No

95. Whether metadata will be updated if we load partition data using LOAD DATA command?

Yes

  • Load data local inpath ‘/home/hduser/hive/student_details2’ into table student partition(course= “hadoop”);

96. Can we update and delete in the Hive table? 

  • By default , no.
  • Yes if the transaction is enabled and the table is in ORC format and bucketed. 
  • TBLPROPERTIES (‘transactional’=’true’)
  • Set hive.enforce.bucketing = true, 

Limitations: 

  • External tables are not supported 
  • Only ORC format is supported 
  • Load is not supported on ACID translation tables. Hence use insert into. 
  • On transactional sessions, all operations are auto-commit as BEGIN, COMMIT AND ROLLBACK  are not yet supported.

 97. How does partition help in managing data? 

  • We can remove the portion data from the table easily by just deleting the partition. 
  • We can update the portion of data with updated value by just overwriting partition folder data. 

98. How do you load only a few fields from a file?

 It is a two-step process 

  • Load all data to managed table – LOAD DATA LOCAL INPATH 
  • Do insert select with required fields from managed to external table 

99. Scenario: I manually created a partition folder and kept data inside it. But when I query from the hive table for that partition value, I am not getting data. What should I do? 

  • Run msck repair or manually add the partition. 
  • It is Hive’s metastore consistency check. 

100. Scenario: I deleted some of the partition folder manually and then executed msck repair, will it remove the partition from metadata? 

No, we have to manually DROP the partition. 

These top hive interview questions have been designed specifically to get you acquainted with the nature of questions you may encounter during your interview for the subject of Hive.