fbpx

Top 100 Entry-Level Data Analyst Interview Questions and Answers

Top 100 Entry Level Data Analyst Interview Questions and Answers

Contents show

1. What is a Data Analyst’s role in a company?

Answer:
A Data Analyst is responsible for collecting, processing, and analyzing data to support decision-making. They extract insights from data, create reports, and assist in making data-driven decisions.


2. How do you import a CSV file into Python for analysis?

Answer:

import pandas as pd
data = pd.read_csv('file_path.csv')

Explanation:
This code uses the pandas library to import a CSV file. The read_csv function reads the file and stores it as a DataFrame for analysis.


3. Explain the difference between ‘group by’ and ‘having’ clause in SQL.

Answer:
The GROUP BY clause is used to group rows that have the same values into summary rows. The HAVING clause filters groups based on a condition after the GROUP BY operation.


4. How do you handle missing values in a dataset using Python?

Answer:

# Remove rows with missing values
data.dropna()

# Fill missing values with mean
data.fillna(data.mean(), inplace=True)

Explanation:
The code snippet demonstrates two common methods: removing rows with missing values or filling them with the mean.


5. Explain what a pivot table is and how it can be useful.

Answer:
A pivot table is a data summarization tool used in spreadsheet programs. It allows users to aggregate and analyze large datasets, making it easier to derive meaningful insights and identify trends.


6. How do you perform data visualization in Python?

Answer:

import matplotlib.pyplot as plt
data.plot(kind='bar')
plt.show()

Explanation:
This code snippet uses the matplotlib library to create a bar chart from the dataset.


7. What is the importance of data cleaning in analysis?

Answer:
Data cleaning is crucial as it ensures the accuracy and reliability of the analysis. It involves removing errors, handling missing values, and standardizing data formats.


8. How can you identify outliers in a dataset?

Answer:

import seaborn as sns
sns.boxplot(x=data['column_name'])

Explanation:
This code snippet uses a boxplot to visualize and identify potential outliers in a specific column.


9. Explain the concept of correlation in statistics.

Answer:
Correlation measures the statistical relationship between two or more variables. It ranges from -1 to 1, where 1 indicates a strong positive correlation, -1 indicates a strong negative correlation, and 0 indicates no correlation.


10. How do you handle large datasets that do not fit into memory?

Answer:
You can use techniques like data chunking, parallel processing, or utilizing distributed computing frameworks like Apache Spark to handle large datasets efficiently.


11. What is the difference between a bar chart and a histogram?

Answer:

  • A bar chart displays categorical data with rectangular bars, each representing a category.
  • A histogram displays the distribution of a continuous numerical variable.

12. How do you perform data normalization?

Answer:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data[['column_name']])

Explanation:
The code snippet uses MinMaxScaler to scale the values between 0 and 1, ensuring uniformity in the data range.


13. Explain the concept of one-hot encoding.

Answer:
One-hot encoding is a technique used to convert categorical data into a binary format. It creates binary columns for each category, marking the presence with ‘1’ and absence with ‘0’.


14. How do you identify multicollinearity in a regression analysis?

Answer:

import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor

X = data[['feature1', 'feature2', 'feature3']]
vif_data = pd.DataFrame()
vif_data["Feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

Explanation:
This code snippet uses the Variance Inflation Factor (VIF) to detect multicollinearity between independent variables.


15. What is a time series analysis, and how is it useful?

Answer:
Time series analysis involves analyzing data points collected or recorded at specific time intervals. It’s useful for forecasting future trends based on historical patterns.


16. How do you select the right algorithm for a machine learning model?

Answer:

  • It depends on the type of problem (classification, regression, etc.).
  • Consider the size and nature of the dataset.
  • Evaluate the assumptions of the algorithm.
  • Perform cross-validation to assess model performance.

17. Explain the concept of A/B testing.

Answer:
A/B testing is a method of comparing two versions of a webpage or app against each other to determine which performs better in terms of user engagement or conversion rates.


18. How do you handle imbalanced datasets in machine learning?

Answer:

  • Use techniques like resampling (oversampling minority class or undersampling majority class).
  • Utilize algorithms designed for imbalanced data, like Random Forest or XGBoost.
  • Apply techniques like Synthetic Minority Over-sampling Technique (SMOTE).

19. What is the purpose of a SQL join?

Answer:
A SQL join combines rows from two or more tables based on a related column between them, enabling the extraction of relevant information from multiple tables.


20. How do you optimize a SQL query for better performance?

Answer:

  • Use indexes on columns frequently used in WHERE clauses.
  • Avoid using SELECT *; only retrieve necessary columns.
  • Optimize subqueries or use JOINs instead.

21. Scenario: Calculating Total Sales

Question:
Suppose you have a table named orders with columns order_id, customer_id, and amount. Write an SQL query to calculate the total sales for each customer.

Answer:

SELECT customer_id, SUM(amount) as total_sales
FROM orders
GROUP BY customer_id;

22. Scenario: Finding Duplicate Records

Question:
You suspect there are duplicate records in a table named users based on the email column. Write an SQL query to identify and list the duplicate email addresses.

Answer:

SELECT email, COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

23. Scenario: Calculating Moving Average

Question:
Given a table daily_sales with columns date and amount, write an SQL query to calculate the 7-day moving average of sales.

Answer:

SELECT date, 
       amount,
       AVG(amount) OVER (ORDER BY date RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg
FROM daily_sales;

24. Scenario: Finding Missing Values

Question:
You have a table named products with columns product_id, product_name, and price. Write an SQL query to find products with missing prices.

Answer:

SELECT product_id, product_name
FROM products
WHERE price IS NULL;

25. Scenario: Calculating Employee Tenure

Question:
Given a table employees with columns employee_id, hire_date, and termination_date, write an SQL query to calculate the tenure (in years) of each employee.

Answer:

SELECT employee_id, 
       hire_date, 
       termination_date,
       DATEDIFF(termination_date, hire_date) as tenure_in_days
FROM employees;

26. Scenario: Identifying Outliers

Question:
Suppose you have a table sales with columns transaction_id and amount. Write an SQL query to identify transactions that are outliers (considering a threshold, for example, three times the standard deviation).

Answer:

SELECT transaction_id, amount
FROM sales
WHERE ABS(amount - (SELECT AVG(amount) FROM sales)) > 3 * (SELECT STDEV(amount) FROM sales);

27. Scenario: Finding Nth Highest Salary

Question:
Write an SQL query to find the 3rd highest salary from a table employees with columns employee_id and salary.

Answer:

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 2, 1;

28. Scenario: Deleting Duplicate Records

Question:
You have a table orders with columns order_id, customer_id, and order_date. Write an SQL query to delete duplicate orders based on the combination of customer_id and order_date, keeping only the latest order.

Answer:

DELETE o1
FROM orders o1, orders o2
WHERE o1.customer_id = o2.customer_id
      AND o1.order_date = o2.order_date
      AND o1.order_id < o2.order_id;

29. Scenario: Finding the Mode

Question:
Given a table survey_responses with columns user_id and response, write an SQL query to find the mode (most frequent response) for each user.

Answer:

SELECT user_id,
       response,
       COUNT(response) as frequency
FROM survey_responses
GROUP BY user_id, response
HAVING COUNT(response) = (SELECT MAX(frequency) FROM (SELECT user_id, response, COUNT(response) as frequency FROM survey_responses GROUP BY user_id, response) as temp WHERE temp.user_id = survey_responses.user_id);

30. Scenario: Finding Overlapping Date Ranges

Question:
You have a table bookings with columns booking_id, start_date, and end_date. Write an SQL query to find overlapping booking periods.

Answer:

SELECT b1.booking_id, b2.booking_id as overlapping_booking_id
FROM bookings b1, bookings b2
WHERE b1.booking_id <> b2.booking_id
      AND b1.start_date <= b2.end_date
      AND b1.end_date >= b2.start_date;

31. Scenario: Calculating Percentiles

Question:
Suppose you have a table exam_scores with columns student_id and score. Write an SQL query to calculate the 75th percentile score.

Answer:

SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY score) as percentile_75
FROM exam_scores;

32. Scenario: Finding Longest Consecutive Sequence

Question:
Given a table numbers with a column num, write an SQL query to find the longest consecutive sequence of numbers.

Answer:

SELECT MIN(num) as start_num, MAX(num) as end_num
FROM (
    SELECT num, ROW_NUMBER() OVER (ORDER BY num) - ROW_NUMBER() OVER (PARTITION BY num ORDER BY num) as grp
    FROM numbers
) as temp
GROUP BY grp
ORDER BY COUNT(*) DESC
LIMIT 1;

33. Scenario: Calculating Running Total

Question:
Suppose you have a table sales with columns transaction_date and amount. Write an SQL query to calculate the running total of sales.

Answer:

SELECT transaction_date, 
       amount,
       SUM(amount) OVER (ORDER BY transaction_date) as running_total
FROM sales;

34. Scenario: Finding Islands of 1s in Binary Matrix

Question:
Given a binary matrix represented by a table binary_matrix with columns row_id, col_id, and value (0 or 1), write an SQL query to find the number of islands of 1s.

Answer:

SELECT COUNT(*) as num_islands
FROM (
    SELECT DISTINCT a.row_id, a.col_id
    FROM binary_matrix a
    WHERE a.value = 1
      AND NOT EXISTS (
          SELECT 1
          FROM binary_matrix b
          WHERE b.value = 1
            AND ABS(a.row_id - b.row_id) <= 1
            AND ABS(a.col_id - b.col_id) <= 1
            AND (a.row_id != b.row_id OR a.col_id != b.col_id)
      )
) as islands;

35. Scenario: Calculating Moving Average

Question:
Given a table sales with columns transaction_date and amount, write an SQL query to calculate the 3-day moving average of sales.

Answer:

SELECT transaction_date, 
       amount,
       AVG(amount) OVER (ORDER BY transaction_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg
FROM sales;

36. Scenario: Identifying Duplicates

Question:
You have a table employees with columns employee_id and name. Write an SQL query to find all duplicate names.

Answer:

SELECT name
FROM employees
GROUP BY name
HAVING COUNT(*) > 1;

37. Scenario: Finding Missing Numbers

Question:
Suppose you have a table numbers with a column num. Write an SQL query to find missing numbers in a range from 1 to 100.

Answer:

SELECT num+1 as missing_number
FROM numbers a
WHERE NOT EXISTS (
    SELECT 1
    FROM numbers b
    WHERE b.num = a.num+1
)
AND num < 100;

38. Scenario: Calculating Age

Question:
Given a table users with columns user_id and birth_date, write an SQL query to calculate the age of each user.

Answer:

SELECT user_id,
       birth_date,
       YEAR(CURRENT_DATE) - YEAR(birth_date) - 
       (CASE WHEN MONTH(CURRENT_DATE) < MONTH(birth_date)
              OR (MONTH(CURRENT_DATE) = MONTH(birth_date) AND DAY(CURRENT_DATE) < DAY(birth_date)) THEN 1 ELSE 0 END) as age
FROM users;

39. Scenario: Finding Nth Highest Salary

Question:
You have a table employees with columns employee_id and salary. Write an SQL query to find the second highest salary.

Answer:

SELECT MAX(salary) as second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

40. Scenario: Generating Fibonacci Series

Question:
Write an SQL query to generate the first 10 numbers of the Fibonacci series.

Answer:

WITH RECURSIVE fibonacci(n, a, b) AS (
    SELECT 1, 0, 1
    UNION ALL
    SELECT n+1, b, a+b FROM fibonacci WHERE n < 10
)
SELECT a as fibonacci_number FROM fibonacci;

41. Scenario: Finding Duplicate Records

Question:
In a table orders, there is a column order_id. Write an SQL query to find all orders that have duplicate order_id entries.

Answer:

SELECT order_id
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;

42. Scenario: Updating Records

Question:
Suppose you have a table products with columns product_id and price. Write an SQL query to increase the price of all products by 10%.

Answer:

UPDATE products
SET price = price * 1.1;

43. Scenario: Finding Active Users

Question:
Given a table users with columns user_id and last_login, write an SQL query to find users who have logged in within the last 30 days.

Answer:

SELECT user_id, last_login
FROM users
WHERE last_login >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY);

44. Scenario: Calculating Total Sales

Question:
You have a table sales with columns transaction_id and amount. Write an SQL query to calculate the total sales amount.

Answer:

SELECT SUM(amount) as total_sales
FROM sales;

45. Scenario: Finding the Longest String

Question:
Suppose you have a table strings with a column text. Write an SQL query to find the longest string.

Answer:

SELECT text
FROM strings
ORDER BY LENGTH(text) DESC
LIMIT 1;

46. Scenario: Calculating Median

Question:
Given a table scores with a column score, write an SQL query to calculate the median score.

Answer:

SELECT AVG(score) as median_score
FROM (
    SELECT score
    FROM scores
    ORDER BY score
    LIMIT 2 - (SELECT COUNT(*) FROM scores) % 2
    OFFSET (SELECT (COUNT(*) - 1) / 2 FROM scores)
) subquery;

47. Scenario: Finding Anagrams

Question:
You have a table words with a column word. Write an SQL query to find all pairs of anagrams.

Answer:

SELECT a.word as word1, b.word as word2
FROM words a, words b
WHERE a.word != b.word
  AND LENGTH(a.word) = LENGTH(b.word)
  AND SORTED_STRING(a.word) = SORTED_STRING(b.word);

(Note: SORTED_STRING is a user-defined function that returns the input string with its characters sorted alphabetically.)


48. Scenario: Calculating Running Total

Question:
Given a table orders with columns order_id and amount, write an SQL query to calculate the running total of the amount column, ordered by order_id.

Answer:

SELECT order_id, amount,
       SUM(amount) OVER (ORDER BY order_id) as running_total
FROM orders;

49. Scenario: Finding Nth Highest Salary

Question:
In a table employees, there is a column salary. Write an SQL query to find the Nth highest salary.

Answer:

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT N-1, 1;

50. Scenario: Pivot Table

Question:
You have a table sales with columns month, product, and revenue. Write an SQL query to create a pivot table where rows represent months, columns represent products, and the values are the total revenue.

Answer:

SELECT *
FROM (
    SELECT month, product, revenue
    FROM sales
)
PIVOT (
    SUM(revenue)
    FOR product IN ('Product1' as Product1, 'Product2' as Product2, 'Product3' as Product3)
) as PivotTable;

51. Scenario: Finding Missing Numbers

Question:
Given a table numbers with a column num, which contains integers from 1 to N (where some numbers are missing), write an SQL query to find the missing numbers.

Answer:

WITH RECURSIVE NumbersCTE AS (
    SELECT 1 as num
    UNION ALL
    SELECT num + 1
    FROM NumbersCTE
    WHERE num < (SELECT MAX(num) FROM numbers)
)
SELECT num
FROM NumbersCTE
WHERE num NOT IN (SELECT num FROM numbers);

52. Scenario: Calculating Age

Question:
You have a table users with a column birth_date. Write an SQL query to calculate the age of each user.

Answer:

SELECT user_id, 
       DATEDIFF(CURRENT_DATE, birth_date)/365 as age
FROM users;

53. Scenario: Recursive Hierarchy

Question:
Suppose you have a table employees with columns employee_id and manager_id representing a hierarchical structure. Write an SQL query to retrieve the entire hierarchy for a given manager.

Answer:

WITH RECURSIVE Hierarchy AS (
    SELECT employee_id, manager_id
    FROM employees
    WHERE manager_id = <given_manager_id>
    UNION ALL
    SELECT e.employee_id, e.manager_id
    FROM employees e
    JOIN Hierarchy h ON e.manager_id = h.employee_id
)
SELECT * FROM Hierarchy;

54. Scenario: Finding Duplicate Rows

Question:
In a table students, some students have identical names and ages. Write an SQL query to find all the duplicate rows based on the name and age columns.

Answer:

SELECT name, age, COUNT(*) as duplicate_count
FROM students
GROUP BY name, age
HAVING COUNT(*) > 1;

55. Scenario: Concatenating Strings

Question:
You have a table employees with columns first_name and last_name. Write an SQL query to create a new column full_name that concatenates the first and last names with a space in between.

Answer:

SELECT first_name, last_name, CONCAT(first_name, ' ', last_name) as full_name
FROM employees;

56. Scenario: Finding Average Value

Question:
Given a table scores with a column score, write an SQL query to find the average score.

Answer:

SELECT AVG(score) as average_score
FROM scores;

57. Scenario: Calculating Percentile

Question:
You have a table test_scores with a column score. Write an SQL query to calculate the 90th percentile score.

Answer:

SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY score) as percentile_90
FROM test_scores;

58. Scenario: Finding Top N Records

Question:
In a table products, there is a column price. Write an SQL query to retrieve the top 10 most expensive products.

Answer:

SELECT product_name, price
FROM products
ORDER BY price DESC
LIMIT 10;

59. Scenario: Grouping by Time Intervals

Question:
You have a table sales with a column sale_date and amount. Write an SQL query to group sales by month and calculate the total sales for each month.

Answer:

SELECT DATE_FORMAT(sale_date, '%Y-%m') as month, SUM(amount) as total_sales
FROM sales
GROUP BY month;

60. Scenario: Finding Overlapping Dates

Question:
Given a table appointments with columns start_date and end_date, write an SQL query to find all appointments that overlap with a specific date range.

Answer:

SELECT *
FROM appointments
WHERE start_date <= <end_date> AND end_date >= <start_date>;

61. Scenario: Calculating Daily Growth

Question:
In a table user_activity, there is a column users representing the number of daily active users. Write an SQL query to calculate the daily growth in the number of users.

Answer:

SELECT date, users,
       users - LAG(users) OVER (ORDER BY date) as daily_growth
FROM user_activity;

62. Scenario: Finding Closest Points

Question:
Given a table locations with columns latitude and longitude, write an SQL query to find the closest locations to a specific point with given latitude and longitude.

Answer:

SELECT location_name, latitude, longitude,
       SQRT(POW(latitude - <given_latitude>, 2) + POW(longitude - <given_longitude>, 2)) as distance
FROM locations
ORDER BY distance
LIMIT 10;

63. Scenario: Calculating Running Total

Question:
In a table orders, there is a column order_date and order_amount. Write an SQL query to calculate the running total of order amounts.

Answer:

SELECT order_date, order_amount,
       SUM(order_amount) OVER (ORDER BY order_date) as running_total
FROM orders;

64. Scenario: Handling NULL Values

Question:
You have a table employees with columns employee_id, first_name, and last_name. Write an SQL query to display the full names of employees. If the last name is NULL, it should display “Unknown” instead.

Answer:

SELECT employee_id, 
       CONCAT(first_name, ' ', COALESCE(last_name, 'Unknown')) as full_name
FROM employees;

65. Scenario: Finding Nth Highest Value

Question:
Given a table scores with a column score, write an SQL query to find the 3rd highest score.

Answer:

SELECT DISTINCT score
FROM scores
ORDER BY score DESC
LIMIT 2, 1;

66. Scenario: Calculating Age

Question:
In a table users, there is a column birth_date. Write an SQL query to calculate the age of each user.

Answer:

SELECT user_name, birth_date,
       YEAR(CURRENT_DATE) - YEAR(birth_date) - 
       (DATE_FORMAT(CURRENT_DATE, '%m%d') < DATE_FORMAT(birth_date, '%m%d')) as age
FROM users;

67. Scenario: Calculating Median Value

Question:
You have a table test_scores with a column score. Write an SQL query to calculate the median score.

Answer:

SELECT 
  AVG(score) as median_score
FROM 
  (SELECT score FROM test_scores ORDER BY score LIMIT 2 - (SELECT COUNT(*) FROM test_scores) % 2, (SELECT COUNT(*)+1 FROM test_scores) / 2) as tbl;

68. Scenario: Retrieving Random Rows

Question:
Write an SQL query to retrieve a random row from a table products.

Answer:

SELECT * 
FROM products
ORDER BY RAND()
LIMIT 1;

69. Scenario: Calculating Cumulative Percentage

Question:
Given a table sales with columns product_name and sales_amount, write an SQL query to calculate the cumulative percentage of total sales for each product.

Answer:

SELECT product_name, sales_amount,
       (SUM(sales_amount) OVER (ORDER BY sales_amount DESC) / 
       (SELECT SUM(sales_amount) FROM sales)) * 100 as cumulative_percentage
FROM sales;

70. Scenario: Finding Missing Numbers

Question:
In a table numbers, some numbers are missing. Write an SQL query to find the missing numbers between a specific range.

Answer:

WITH RECURSIVE missing_numbers AS (
  SELECT <start_number> as number
  UNION
  SELECT number + 1 FROM missing_numbers WHERE number < <end_number>
)
SELECT number
FROM missing_numbers
WHERE number NOT IN (SELECT number FROM numbers);

71. Scenario: Finding Duplicate Records

Question:
Given a table students with columns student_id and name, write an SQL query to find duplicate records based on the name.

Answer:

SELECT name, COUNT(*) as count
FROM students
GROUP BY name
HAVING count > 1;

72. Scenario: Calculating Moving Average

Question:
You have a table temperature_readings with columns reading_date and temperature. Write an SQL query to calculate the 7-day moving average of temperatures.

Answer:

SELECT reading_date, temperature,
       AVG(temperature) OVER (ORDER BY reading_date 
                             ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_average
FROM temperature_readings;

73. Scenario: Finding Longest Consecutive Sequence

Question:
In a table numbers, there is a column num. Write an SQL query to find the longest consecutive sequence of numbers.

Answer:

SELECT MIN(num) as start_num, MAX(num) as end_num
FROM (
  SELECT num, 
         num - ROW_NUMBER() OVER (ORDER BY num) as grp
  FROM numbers
) as t
GROUP BY grp
ORDER BY COUNT(*) DESC
LIMIT 1;

74. Scenario: Generating Rank within Groups

Question:
Given a table sales with columns product_name and sales_amount, write an SQL query to calculate the rank of products based on sales within each category.

Answer:

SELECT product_name, sales_amount,
       DENSE_RANK() OVER (PARTITION BY product_category ORDER BY sales_amount DESC) as rank_within_category
FROM sales;

75. Scenario: Finding Latest Record in Each Group

Question:
You have a table customer_orders with columns order_id, customer_id, and order_date. Write an SQL query to find the latest order for each customer.

Answer:

SELECT customer_id, MAX(order_date) as latest_order_date
FROM customer_orders
GROUP BY customer_id;

76. Scenario: Calculating Time Difference

Question:
In a table log_events, there are columns event_type and event_timestamp. Write an SQL query to calculate the time difference between consecutive events of the same type.

Answer:

SELECT event_type, event_timestamp,
       TIMEDIFF(event_timestamp, 
                LAG(event_timestamp) OVER (PARTITION BY event_type ORDER BY event_timestamp)) as time_diff
FROM log_events;

77. Scenario: Finding Common Elements

Question:
Given two tables table1 and table2 with columns element, write an SQL query to find the common elements between the two tables.

Answer:

SELECT element
FROM table1
INTERSECT
SELECT element
FROM table2;

78. Scenario: Calculating Exponential Moving Average (EMA)

Question:
You have a table stock_prices with columns price_date and stock_price. Write an SQL query to calculate the 10-day exponential moving average of stock prices.

Answer:

SELECT price_date, stock_price,
       AVG(stock_price) OVER (ORDER BY price_date 
                             ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) as ema_10_day
FROM stock_prices;

79. Scenario: Finding Nth Highest Salary

Question:
Write an SQL query to find the nth highest salary from a table employees.

Answer:

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT (n-1), 1;

80. Scenario: Calculating Total Sales by Month

Question:
Given a table sales with columns sale_date and amount, write an SQL query to calculate the total sales amount for each month.

Answer:

SELECT DATE_FORMAT(sale_date, '%Y-%m') as month, SUM(amount) as total_sales
FROM sales
GROUP BY month;

81. Scenario: Calculating Running Total

Question:
You have a table orders with columns order_date and order_amount. Write an SQL query to calculate the running total of order amounts.

Answer:

SELECT order_date, order_amount,
       SUM(order_amount) OVER (ORDER BY order_date) as running_total
FROM orders;

82. Scenario: Finding Missing Numbers

Question:
In a table numbers, some numbers are missing. Write an SQL query to find the missing numbers in a given range.

Answer:

SELECT DISTINCT n1 + 1 as missing_number
FROM numbers n1, numbers n2
WHERE n1 + 1 NOT IN (SELECT num FROM numbers)
      AND n1 < n2
      AND n1 + 1 <= (SELECT MAX(num) FROM numbers);

83. Scenario: Finding Second Highest Salary

Question:
Write an SQL query to find the second highest salary from a table employees.

Answer:

SELECT MAX(salary) as second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

84. Scenario: Calculating Age

Question:
Given a table users with a column date_of_birth, write an SQL query to calculate the age of each user.

Answer:

SELECT user_id, name, date_of_birth,
       YEAR(CURDATE()) - YEAR(date_of_birth) - 
       (DATE_FORMAT(CURDATE(), '%m%d') < DATE_FORMAT(date_of_birth, '%m%d')) as age
FROM users;

85. Scenario: Finding Top N Records

Question:
You have a table products with columns product_name and sales. Write an SQL query to find the top 5 products by sales.

Answer:

SELECT product_name, sales
FROM products
ORDER BY sales DESC
LIMIT 5;

86. Scenario: Finding Duplicate Records

Question:
Given a table students with columns student_id and name, write an SQL query to find duplicate records.

Answer:

SELECT student_id, name
FROM students
GROUP BY student_id, name
HAVING COUNT(*) > 1;

87. Scenario: Calculating Percentiles

Question:
You have a table test_scores with columns student_id and score. Write an SQL query to calculate the 90th percentile of scores.

Answer:

SELECT PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY score) as percentile_90
FROM test_scores;

88. Scenario: Finding Uncommon Elements

Question:
Given two tables table1 and table2 with a common column id, write an SQL query to find elements that are present in either table1 or table2, but not in both.

Answer:

SELECT id
FROM table1
WHERE id NOT IN (SELECT id FROM table2)
UNION ALL
SELECT id
FROM table2
WHERE id NOT IN (SELECT id FROM table1);

89. Scenario: Calculating Median

Question:
In a table scores with column score, write an SQL query to calculate the median of scores.

Answer:

SELECT AVG(score) as median
FROM (
    SELECT score
    FROM scores
    ORDER BY score
    LIMIT 2 - MOD((SELECT COUNT(*) FROM scores), 2), 1
) as subquery;

90. Scenario: Finding Cycle in a Directed Graph

Question:
Given a table edges with columns source_node and target_node, write an SQL query to find if there’s a cycle in the directed graph.

Answer:

SELECT DISTINCT e1.source_node, e1.target_node
FROM edges e1, edges e2
WHERE e1.target_node = e2.source_node
      AND e1.source_node = e2.target_node;

91. Scenario: Calculating Moving Average

Question:
You have a table sales with columns sale_date and amount. Write an SQL query to calculate the 3-day moving average of sales.

Answer:

SELECT sale_date, amount,
       AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg
FROM sales;

92. Scenario: Finding Prime Numbers

Question:
Write an SQL query to find all prime numbers less than or equal to n.

Answer:

WITH RECURSIVE primes AS (
    SELECT 2 as num
    UNION
    SELECT num + 1
    FROM primes
    WHERE NOT EXISTS (
        SELECT 1
        FROM primes p
        WHERE p.num > 1 AND p.num < num AND num % p.num = 0
    )
    AND num <= n
)
SELECT num FROM primes;

93. Scenario: Calculating Cumulative Sum

Question:
You have a table orders with columns order_id, customer_id, and order_total. Write an SQL query to calculate the cumulative sum of order_total for each customer.

Answer:

SELECT order_id, customer_id, order_total,
       SUM(order_total) OVER (PARTITION BY customer_id ORDER BY order_id) as cumulative_total
FROM orders;

94. Scenario: Finding Missing Numbers

Question:
Given a table numbers with a column num, containing numbers from 1 to n, write an SQL query to find the missing numbers.

Answer:

WITH RECURSIVE missing_nums AS (
    SELECT 1 as num
    UNION
    SELECT num + 1
    FROM missing_nums
    WHERE num < n
)
SELECT num
FROM missing_nums
LEFT JOIN numbers ON missing_nums.num = numbers.num
WHERE numbers.num IS NULL;

95. Scenario: Hierarchical Data

Question:
You have a table employees with columns employee_id, name, and manager_id (where manager_id points to employee_id). Write an SQL query to retrieve the hierarchical structure.

Answer:

WITH RECURSIVE org_chart AS (
    SELECT employee_id, name, manager_id, 0 as level
    FROM employees
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.employee_id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT *
FROM org_chart;

96. Scenario: Generating Random Data

Question:
Write an SQL query to generate 10 random numbers between 1 and 100.

Answer:

SELECT ROUND(RANDOM() * 99 + 1)::INT as random_number
FROM generate_series(1, 10);

97. Scenario: Finding Longest Consecutive Sequence

Question:
Given a table numbers with a column num, write an SQL query to find the longest consecutive sequence of numbers.

Answer:

SELECT MIN(num) as start_range, MAX(num) as end_range
FROM (
    SELECT num, ROW_NUMBER() OVER (ORDER BY num) - ROW_NUMBER() OVER (PARTITION BY num ORDER BY num) as grp
    FROM numbers
) as subquery
GROUP BY grp
ORDER BY COUNT(*) DESC
LIMIT 1;

98. Scenario: Calculating Fibonacci Sequence

Question:
Write an SQL query to generate the first 10 numbers in the Fibonacci sequence.

Answer:

WITH RECURSIVE fibonacci AS (
    SELECT 0 as n, 0 as fib
    UNION ALL
    SELECT 1 as n, 1 as fib
    UNION ALL
    SELECT n + 1, fib + (LAG(fib) OVER (ORDER BY n))
    FROM fibonacci
    WHERE n < 9
)
SELECT fib
FROM fibonacci;

99. Scenario: Calculating Moving Averages

Question:
You have a table sales with columns date and amount. Write an SQL query to calculate the 7-day moving average of sales.

Answer:

SELECT date, amount, 
       AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg
FROM sales;

100. Scenario: Finding Duplicate Records

Question:
Given a table users with columns user_id and email, write an SQL query to find and list all the duplicate email addresses.

Answer:

SELECT email, COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;