Top 100 SAS Interview Questions And Answers

SAS Interview questions and answers are designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of SAS programming. These top 100 SAS interview questions and answers would help you to crack the interview with confidence. Some of the SAS interview questions and answers require memorizing certain concepts. 

Table of Contents

1. what is SAS? 

This is categorized under basic SAS interview questions 

SAS(Statistical Analysis System) is a combined set of software solutions that helps users to analyze data.

It will be helpful in : 

  • Information retrieval and data management
  • Writing reports and graphics
  • Statistical analysis, econometrics, and data mining
  • Business planning, forecasting, and decision support
  • Operation Research and Project management
  • Quality improvement 
  • Data warehousing 
  • Application development 

2. Explain what is Data Step?

The data set creates a SAS dataset that carries the data along with a ‘data dictionary’. The data dictionary holds information about the variables and their properties. 

3. Explain the basic structure of SAS programming.

SAS program consists of :

  • DATA step, which recovers and manipulates data 
  • PROC step, which interprets the data

4. Explain the Scan function. 

Syntax: scan(argument,n,delimiters)

Argument: specifies the character variable or expression to scan

n: specifies which word to read

delimiters:  specifies special characters that must be enclosed in single quotation marks.

5. What is the function of a stop statement?  

Stop statement causes SAS to stop processing the current data step immediately and resume processing after the end of the current data step. 

6. Difference b/w sum function and using ‘+’ operator?

SUM function returns the sum of non-missing arguments.  

‘+’ operator returns a missing value if any of the arguments are missing.

7. How to perform ‘table lookup’? 

  • Match merging
  • Direct access
  • Format tables
  • Arrays
  • PROC SQL

8. Explain data step processing. 

When we submit a Data step, SAS processes the DATA step, SAS processes the DATA step, and then creates a new SAS data set. 

There will be two phases:

  • Compilation phase
  • Execution phase

9. What is the difference between b/w PROC Means and PROC Summary? 

Proc SummaryProc Means
Defaults to NOPRINT Proc Means Defaults to PRINT
If you omit the VAR statement, then produces a simple count of observationsTries to analyze all the numeric variables that are not listed in other statements
While specifying statistics on the statement, VAR statement is omitted, then it stops processing and an error message is written to the SAS.If you omit the VAR statement, then it analyzes all numeric variables that are not listed in the other statements. When all variables are character variables it produces a simple count of observation. 

10. How to create list output for cross tabulations in proc freq?  

To generate list output for cross tabulations, add a  slash(/) and the LIST option to the Tables statement in your PROC FREQ step. 

TABLES variable-1*variable-2 <* … variable-n> / LIST;

11. How to debug and test SAS programs? 

Look into Log for errors or warnings or NOTE in some cases use the debugger in SAS data step. 

There are some system options that can be used to debug SAS Macros: MPRINT, MLOGIC, SYMBOLGEN

12. What is the function of tranwrd function?  

TRANWRD function replaces or removes all occurrences of a pattern of characters within a character string. 

13. What is the difference between do while and do until?   

Main difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. 

If the expression is false the first time it is evaluated, then the DO loop never executes. 

But DO UNTIL executes at least once.

14. How to achieve efficiency in the SAS program? 

Efficiency and performance strategies are classified into 5 different areas:

  • CPU time
  • Data Storage 
  • Elapsed time 
  • Input/Output 
  • Memory CPU time and Elapsed time 

15. Give examples of efficiency techniques.

  • Using KEEP and DROP statements to retain necessary variables
  • Use macros for reducing the code
  • Use SQL procedure to reduce the number of programming steps
  • Using IF-THEN/ELSE statements to reduce the variable size for reducing Data storage
  • Using _NULL_ steps for processing null data sets for Data storage

 How to remove duplicates using PROC SQL?  

  • Proc SQL noprint; 
  • Create table inter.Merged 1 as 
  • Select distinct * from inter.readin;
  • Quit;

16.  How to remove duplicates using PROC SQL?

Proc SQL noprint; 
Create Table inter.Merged1 as
Select distinct * from inter.readin;
Quit;

17.  Explain CATX.

CATX syntax concatenates character strings, removes trailing and leading blanks, and inserts separators. 

18. Explain how you can debug and test your SAS program?

You can debug and test your SAS program by using Obs=o and systems options to trace the program execution in a log. 

19.  Mention the categories in which SAS Informatics are placed? 

SAS informants are placed in three categories :

  • Character informants: $INFORMATw
  • Numeric Informants: INFORMAT w.d
  • Date/Time Informants: INFORMAT w.

20.  Mention the validation tools used in SAS? 

For Dataset : Data set name/ debug Data set: Name/stmtchk

For Macros: Options: imprint logic symbolgen

Top SAS Interview Questions And Answers

21. Explain PROC print and PROC contents? 

PROC print: To display the contents of the SAS dataset and also to assure that the data were read into SAS correctly. 

PROC Contents: To display information about a SAS dataset. 

23. Explain the difference between nod up key and nod up options. 

NODUPNODUPKEY
Compares all the variables in our datasetCompares just the BY variables
Removes duplicate observations where values in all the variables are repeated(identical observations)Removes duplicate observations where the value of a variable listed in the BY statement is repeated 

24. Explain what SAS informants?  

SAS informants are used to reading, or input data from external files known as Flat Files ASCII files, text files, or sequential files. 

The informant will tell SAS how to read data into SAS variables.

25. How to count unique values by a grouping variable? 

We can use PROC SQL  with COUNT(DISTINCT variable_name) to determine the number of unique values for a column. 

26. Explain proc glam. 

Proc glam performs : 

  • Simple and multiple regression
  • Analysis of variance (ANOVA)
  • Analysis of covariance 
  • Multivariate analysis of variance
  • Repeated measure analysis of variance

27. What is a Program Data Vector(PDV)?

PDV is a logical area in the memory. PDV is created followed by the creation of an input buffer. 

SAS builds a dataset in the PDV area of memory. 

28. Explain SYMGET and SYMPUT. 

SYMPUT: puts the value from a dataset into a macro variable. 

SYMGET: gets the value from the macro variable to the dataset.

29. What is the difference between SCAN and SUBSTR? 

SCAN: extracts words within a value that is marked by delimiters

SUBSTR: extracts a portion of the value by stating the specific location 

30. What is the difference between Proc Means and Proc Summary? 

The difference between the two procedures is:

Proc means:  will give descriptive statistics. By default, it will give output in the output window. 

See also  Top 50 Spring Boot Interview Questions And Answers

Proc summary: will not give output as default, we need to give an option to print then only it will give the output. 

31.  What are the key features of SAS? 

  • Strong Data analysis abilities
  • SAS Studio
  • Support for various types of Data Format
  • Flexible 4 generation programming language (4GL)
  • Report output format 
  • Data Encryption Algorithms 
  • SAS management 
  • AI, ML, and IoT

32.  How many data types are there in SAS? 

Two data types are present: Character and Numeric.

33. How to limit decimal places for variables using PROC MEANS?

By using the MAXDEC= option 

34.  How to specify variables to be processed by the FREQ procedure? 

By using TABLES Statement 

35. What is the purpose of double trailing @@ in the Input Statement?

The double trailing sign (@@) tells SAS rather than advancing to a new record, hold the current input record for the execution of the next INPUT statement. 

36.  How to include or exclude specific variables in a data set? 

Using DROP, KEEP Statements and Data set options. 

37.  What are the default statistics that PROC MEANS produce? 

It produces the default statistics of MIN, MAX, MEAN, and STD DEV. 

38. What is Data_NULL_?

It can also be used to write output without creating a dataset.

39. How to remove unique and duplicate values?

By using PROC SORT with NODUPKEY and NODUP Options. 

40. How to sort in descending order? 

Use the DESCENDING keyword in the PROC SORT code. 

Top SAS Interview Questions And Answers

41. How to convert a numeric variable to a character variable? 

By creating a differently-named variable using the PUT function.  

42. How to convert a character variable to a numeric variable? 

By creating a differently-named variable using the INPUT function.

43. Explain what is Factor analysis? 

The factor analysis in SAS is used for the particular statistical methods that are basically associated with the elimination of variables, which are in terms of the factors and numbers. The main purpose of this factor analysis is to summarize and reduce the data.  

44.  What is the difference between SET and MERGE?  

SET concatenates the data sets whereas MERGE matches the observations of data sets

45. Which date function advances a date, time, or DateTime value by a given interval? 

INTNX function advances a date, time, or DateTime value by a given interval, and returns a date, time, or DateTime value. 

 46. What is the purpose of using the RETAIN statement?  

A RETAIN statement tells SAS not to set variables to missing when going from the current iteration of the DATA step to the next. Instead, SAS retains the values.

47.  What is the difference between %EVAL and %SYSEVALF? 

%EVAL cannot perform arithmetic calculations with operands that have floating-point values. It is when the %SYSEVALF function comes into the picture. 

48. Name a few SAS functions? 

Scan, Substr, trim, Catx, Index, tranwrd, find Sum.

49. What is the difference between INPUT and INFILE? 

The INFILE statement is used to identify an external file while the INPUT statement is used to describe your variables. 

50. Difference between Miss over and Truncover.

When the MISS OVER option is used on the INFILE statement, the INPUT statement does not jump to the next line when reading a short line. Instead, MISS OVER sets variables to missing. 

51. How to print observations 4 through 8 from a data set? 

Using the FISRTOBS= and OBS=statements.

52.  What SUBSTR function does? 

The SUBSTR function is used to extract a substring from a character variable. 

53. What is the difference between CEIL and FLOOR functions? 

The CEIL function returns the smallest integer greater than/equal to the argument whereas the floor returns the greatest integer less than/equal to the argument. 

54.  What is the difference between SCAN and SUBSTR?

SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the substring to extract from a character value. 

55. How to save logs in an external file? 

Use PROC PRINTTO

56. How does Data step Merge and PROC SQL handle many-to-many relationships?

Data step MERGE does not create a cartesian product in case of a many-to-many relationship. Whereas, Proc SQL produces a cartesian product. 

57. What is the smallest length for a numeric and character variable respectively? 

2 bytes and 1 byte 

58. What is the difference between SAS PROCs and the SAS DATA STEP. 

Procs are subroutines with a specific purpose in mind and the data step is designed to read in and manipulate data. 

59. How can you write a SAS data set to a comma-delimited file? 

PUT(formatted) statement in a data step

60. Which SAS statement does not perform automatic conversions in comparisons? 

where statement 

Top SAS Interview Questions And Answers

61. Difference between Input and Put function? 

Input function- Character to numeric conversion-Input(source,informat) and put function-Numeric to character conversion-put(source,format)

62.  If a variable contains letters or special characters, can it be a numeric data type? 

No, it must be a character data type.

63. What can be the size of the largest dataset in SAS?

The number of observations is limited only by the computer’s capacity to handle and store them. 

64. What is the difference between the CLASS statement and BY statement in proc means? 

BY processing requires that your data already be sorted or indexed in the order of the BY variables. 

65.  How would you identify a macro variable? 

With Ampersand(&) sign 

66. How to sort in descending order? 

Use the DESCENDING keyword in the PROC SORT code. The example below shows the use of the descending keyword. 

PROC SORT DATA = auto; By DESCENDING engine; RUN ; 

67. What would be the denominator value used by the mean function if two out of seven arguments are missing? 

Five would be the denominator value 

68. What are the two parts of the SAS dataset? 

  • Descriptive portion(contains data set properties and Variable properties)
  • Data portion 

69. How can you read a dataset that has more than 32 character long names? 

Using SQL explicit pass-through facility. 

70. What will happen to a variable that has a variable name>32 characters long when you read a dataset? 

Variable names will be truncated to 32 characters only. 

71. How can you use special characters in a dataset name? 

By using the VALIDMEMNAME option and name literal.  

option validmemname=extend;
data 'SalaryIn$'n;
salary=800;
run;

72. How can you access MS Excel or MS Access files using the libname statement? 

Using excel engine and access engine.

Example:

  • libname my excel “d:\employee.xlsx”;
  • libname my access “d:\employee.mdb”;

73. Write a program to print names of all datasets present in library SAS USER. 

There are two ways: 

proc contents data=sasuser._all_ nods;
run;
proc sql;
    select memname from dictionary.tables
    where libname="SASUSER";
quit;

74.  How can you read a file that contains special characters in the name such as $? 

We can use name literals in SAS that tells SAS to allow that particular character in the name: 

libname admin "D:\admit.xlsx";
data MY_admission;
 set admit.'admission'n;
 run;

75.  What is a logical error and how to identify it? 

A logical error is something, when a program runs it does not throw any error in the log message however produces incorrect results. You can identify logical errors using putlog or put statements that print statements in the SAS log. 

See also  Top 50 Spring Boot Interview Questions And Answers

76. Write a program to read the last observation of the dataset Sauser. admit? 

data last_observation;
  set sasuser.admit end=x;
    if x;
  run;

77. Write a program to replace missing values for all numeric variables present in the employee dataset to 0? 

options missing=0;
data missing_replace;
  set missing_value;
 run;

78. What are the types of SAS statements? 

  • Global statements: These statements can be used anywhere in the SAS program and stay in effect until changed. For example, statements used with options and title statements. 
  • Data steps and proc step: They stay effective within the Data step and proc step. 

79. What is the purpose of Proc steps? 

Proc steps generally analyze data and produce output, they take SAS dataset as input or other data formats. 

80. A SAS DATA step is processed in how many phases? And in which phase PDV and Descriptor portions of the dataset are created?  

SAS data step is processed in the Compilation phase and Execution phase. The PDV and Descriptor portion is created in the Compilation phase. 

81. What is the use of _N_variable?

The _N_automatic variable represents how many times the DATA step has iterated. 

82. What is the use of _Error_varaible? 

_Error_variable holds value 1 if the error is present in the program. 

83. How to print PDV in SAS log? 

Use the PUTLOG statement to print the PDV in the SAS log example: putlog_all_;

84. Retain statement is compile-time statement or Execution time statement?

Compile statement 

85. What statement is used to avoid truncating variable values? 

Length statement 

86. At the beginning of the Execution Phase what are the values of _error_and_n_variables? 

_Error_=0 and _N_=1

87. Write a program to read the employee dataset’s 5th observation-only, (Note: do not use fifth obs and obs dataset options)?

data read_fifth_obs;  
   set employee; 
    if_n_=5;
run;

88. Once you have had the data Read into SAS Data Sets are you more of a Data Step Programmer or a Proc SQL Programmer?  

It depends on what type of analysis datasets are required for creating tables but I am more of a data step programmer as it gives more flexibility. For e.g creating a change from the baseline dataset for blood pressure sometimes I have to retain certain values, use arrays or use the first and last variables. 

89. What types of Programming tasks do you use Proc SQL versus the Data step? 

Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to joining. A data step is more suitable for sequential observation-by-observation processing. PROC SQL can save a great deal of time if you want to filter the variables while selecting or you can modify them by applying format creating new variables, macro variables as well as subsetting the data. PROC SQL offers great flexibility for joining tables. 

90. Why and when do you use Proc SQL? 

Proc SQL is very convenient for performing table joins compared to data step merge as it does not require the key columns to be sorted prior to joining. 

A data step is more suitable for sequential observation-by-observation processing. 

PROC SQL can save a great deal of time if you want to filter the variables while selecting or we can modify them, apply format and create new variables, macro variables as well as subsetting the data. PROC SQL offers great flexibility for joining tables. 

91. Do you use Proc Report or Proc Tabulate? Which do you prefer? 

 Proc report as it is highly customizable and flexible where I can define each column in whatever way I want to and even make use of SAS functions, logic processing, and assignment statements and create new ones. 

Variables for report making use of the compute block of Proc report. It is a more efficient tool than tabulate because with a report we can do frequency and tabulate also. 

92. How to display duplicate observations in a data using base SAS?

There are two ways to display duplicate  observations:

  • In the data step, use first.var and last.var
  • Using proc sort with option dropout option.

93. Does SAS translate(compile)? or does it interpret? 

A typical SAS program could contain DATA steps, PROC steps, and macros. Macros are preprocessed. Data steps are just in time compiled. PROC steps are interpreted in the order they appear in the program.

So when we submit a SAS program consisting of all these three components, the macro is compiled and executed first. If a DATA step is encountered, then it is compiled and executed.

Note that the DATA step will not be executed if there is an error in the compilation. If a PROC step is encountered, it is interpreted and executed line by line.

94. How do you validate the SAS program? 

When a SAS code is submitted, SAS performs syntactical checks before executing the program/code. In that case, one of the ways could be at the beginning of the code, write OPTIONS OBS =0 in addition to other options and then RUN it. This way data will not be processed and the log shows error messages/warnings if any.

If you are executing SAS code on PC SAS, the highlighted colors themselves show the syntactical errors if any. 

95. When looking for Data contained in a character string of 150 Bytes, which function is the best to locate that data: Scan, Index, Or Index C?  

INDEX: Searches a character expression for a string of characters, and returns the position of the string’s first character for the first occurrence of the string. INDEX(source, excerpt) it returns the position where the 2nd field is in the source. 

96. How would you create multiple observations from a single observation? 

Line pointer is used for multiple lines per observation, @@ is used for multiple observations per line.

97. What are some good SAS programming practices for processing very large data sets?

Arrays are used for processing large datasets. 

98. Why is SAS considered self-documenting? 

When a data set is created SAS creates a descriptor portion that means SAS stores the information like a variable name, length, type, etc. 

99. Have you ever used Proc SQL for Data Stigmatization?

Yes for summarization at times. For eg: if I have to calculate the max value of BP for patients 101 102 and 103 then I use the max(BPD) function to get the maximum value and use group by a statement to group the patients accordingly.

100. Tell me about your SQL experience?  

SAS/ACCESS SQL pass-thru facility for connection with external databases and importing tables from them and also Microsoft access and excel files. Besides this, PROC SQL for joining tables. 

These SAS interview questions and answers are designed for freshers as well as experienced data analysts. Tricky  SAS interview questions include all three difficulty levels. Interviewers ask candidates these SAS interview questions to check their technical knowledge.