5. Test Data
What is Test Data?
The handling of data needed for software testing is ignored.
However, testers often forget that without appropriate test data, software development and testing could drastically fail.
An excellent and representative data set is essential for creating practical test cases.
Test data is the preloaded data in the system as an input by the tester to perform software test execution.
It can be simple sets of usernames and passwords or millions of records of complex data.
The essential requirement for test data is that it should be precise and accurate.
In positive testing, test data verifies the functions that produce expected results, and in negative testing, it verifies functions that produce exceptional or unusual results compared to the expected results.
Importance of Test Data
As per IBM’s 2016 research, approx. 30-60% of a tester’s time is invested in searching, generating, or maintaining test data.
Thus, the importance of test data is:
1. Excessive amount of data
Production is like a haystack of data from which the test data will be compiled.
Exceptional cases are hard to find among the terabytes of available data in order to perform useful tests.
2. No access to the data source
GDPR, HIPAA, PCI, and more security regulations have limited access to the data source.
Although these policies have significantly reduced the chances for a data breach, the test teams become dependent on the few employees. They have access to the data in order to go ahead with the formulation of test cases.
3. Refreshment times are long
As the facility for self-refresh of data is not given to the testing teams, the need to contact the DBA is the same.
It is a lengthy process that can sometimes take days or weeks for the refreshment to be done.
4. Production data access delay
Since agile testing is still not used widely in organizations, when multiple teams work on the same project and access the same databases, it leads to conflicts.
It often happens that the data set when reaches one team has already been altered due to the operations performed on it by the previous team.
Types of Test Data
1. Boundary Data
This is the valid data that satisfies the boundary conditions.
If the data is correctly set, the software gives the expected output according to the input.
2. Huge Data
Performance testing uses a large set of data that is called huge data, which tests whether the system breaks or not under different conditions.
3. Blank Data
As the name suggests, it contains no data or merely a blank file.
This data’s expected outcome is that software does not break, and the exceptions generated are properly handled using appropriate error messages.
4. Valid Data
Valid data is supported or is expected by the software, which gives the expected result for proper input.
5. Invalid Data
This type of data is not supported nor expected by the software and tests whether the system breaks when an invalid set of data is passed.
It also checks whether the exceptions are handled well or not using proper error messages.
Test Data in Testing
1. In Security Testing
Security Testing is responsible for the overall protection of the system from malicious intent.
Thus, the test data designed for security should be designed to test the security of the software thoroughly:
- Confidentiality: The test data should be designed to keep the client’s data securely and is not shared with any other third parties.
- Integrity: After getting an in-depth look inside the system’s design, Database, code, and file structures, appropriate test data can be designed that checks whether the information provided by the system is correct or not.
- Authentication: A variety of test data with different usernames and passwords can be designed to verify if only the authenticated can log in to the system or not. Authentication is the process that establishes the identity of the user.
- Authorization: To check the rights or privileges of a specific user, a test data containing a combination of users, operations, and their roles and designs.
2. In White Box testing
The test data directly examine the code to be tested in white box testing. The following things are taken account while designing it:
- As part of path testing, the test data must be designed to cover the maximum number of test cases covering all paths in the program’s source code.
- In negative API testing, the testing may consist of invalid parameter types or a combination of arguments to call different methods in a program.
- The testing data may be designed in a way that tests all branches in a program’s source code at least once.
3. In Black Box Testing
The test data in functional test cases of black-box testing can have the following criteria:
- Boundary Condition Dataset: Test data that meets the boundary value conditions.
- Use Case Test Data: Use cases synced test data.
- Valid Data: Data to check the system’s response for valid data input.
- Invalid Data: Data to check the system’s response to invalid data input.
- Equivalence Partition Data Set: Test data that qualifies equivalence partitions.
- State Transition Test Data Set: Test Data that meets the testing strategy for state transitions.
- Illegal Data Format: When the test data is illegal, the system’s performance needs to be checked.
- Decision Table Data Set: Test data that qualifies for the decision table testing strategy.
4. In Performance Testing
To determine the maximum workload under which a system can respond quickly to the queries raised is called performance testing.
Usually, the “real” or “live” data obtained from customers is used to generate test data and test cases for performance testing.
This testing is not used to find bugs but is used only to eliminate the bottlenecks.
Customers provide the already existing data or new data in feedback on how the data would look like in the real world.
Good Test Data Properties
A useful test data must be precise and must possess the following qualities:
Test data must be chosen to ensure maximum aspect coverage of a single scenario with minimum data set.
The test data must be accurate and should be in the context of real-life scenarios.
Realistic data usage makes the software more robust, as most bugs will be captured due to real-life conditions.
Realistic data usage also saves the time & effort invested in creating new data again and again.
3. Exceptional data
The test data can also be generated for exceptional scenarios as ad when applicable or required by the system.
These exceptional scenarios are the ones that occur less frequently and demand close attention.
4. Practically valid
These types of data are similar to realistic but are not the same. It is related more to the business logic of AUT.
Techniques for Preparation or Generation of Test Data
There are two techniques to prepare test data:
In this, the test data is inserted as per the test cases’ needs in a database that may not be empty.
- As database tables have interdependencies, inserting data into an empty database is difficult for the tester.
- In the case of performance or load testing, the inserted test data might not be sufficient.
- Help from Database developers will be required as complex queries or procedures might be used to insert test data into the Database.
- As the set of test data is limited, so it might hide some bugs that might have been found, given a more extensive data set had been provided.
- Inserting new data reduces the time required for testing and comparison of results.
- Since the Database contains only the limited data set, the execution of test cases becomes more efficient.
- Due to test data availability, the testing process becomes clutter-free.
- The isolation of bugs takes less time as only the data specified in the test cases is present in the Database.
2. Choose sample data subset
This option is more feasible and practical in approach for the preparation of test data.
The method involves copying and using the production data by replacing field values with dummy values.
To implement this technique, good technical skills and detailed knowledge of Database schema and SQL is required.
Although this technique is the best data subset for testing, it may not always be feasible regarding data security and privacy issues.
Approaches to Test Data Generation
- Back-end data injection: With the help of SQL queries, the existing Database can be easily updated. This approach is fast and efficient but should be carefully implemented to avoid database corruption.
- Manual Test data generation: This time taking and error-prone method involves manual input of test data by the testers according to the test case requirement.
- Using third-party tools: In the market, some tools are accurate to the customers’ business needs but are costly. They can be customized to fit the test scenario for generating or injecting data to provide comprehensive test coverage.
- Automated test data generation: This method is costlier than manual test data generation but provides speedy and accurate data. It makes use of the test data generation tools.
Test Data Generation Tools
|EMS Data generator||EMS|
|IBM DB2 Test Database||IBM|
|DTM Data Generator||SQLEdit|
|SQL Data Generator||Red-Gate|
Test Data Management (TDM)
Test data management is a complete process involving planning, designing, storage, and retrieval of non-production test data that mimic an organization’s actual data so that the developers and testers can perform appropriate tests using them.
The importance of Test Data Management (TDM) is:
- Security: Due to strict regulation imposed by the government and other authorities, the TDM also implements data masking, data security, and security as an integral part of test data generation.
- Trust: TDM provides useful quality data and data coverage that help unravel bugs quickly and efficiently during the testing phase. TDM provides customers with highly-stable and high-quality software with minimum defects, thereby increasing the organization’s trust.
- Test data coverage: Traceability of test data implemented in TDM provides better test data coverage and identification of defect patterns.
- Reusability: Reusability of data reduces the cost as the reusable data is archived and can be accessed as and when required by the testing team.
- Early Bug Detection: Due to better test coverage and traceability, the bugs can be found and resolves early, thus reducing the production cost.
- Provisioned data: In TDM, the data is managed in one place and can be provisioned for different testing types such as functional, performance, integration, and more. It also leads to lower redundancy and storage cost.
- Reduce Copy: TDM maintains all data in the same repository that can be used by all the teams, and thus no team is needed to make several copies of the same data for individual uses. Thus TDM ensures diligent use of storage space.
Limitations of Test Data
- Test data should not contain Privacy Sensitive data or Personally Identifiable Information (PII).
- Privacy rules specified in the Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standard (PCI DSS), and General Data Protection Regulation (GDPR) have limited the use of private data for testing purposes.
- Anonymized data can be used for test and development purposes.
- A tester can also create synthetic data, but it comes with certain limitations, such as the limited possibility of fake data generation, time, cost, and quality limitations.
The testing team is responsible for the test data generation; however, they may or may not have direct access to the production data.
The production is raw data that is not suitable for direct use in testing and requires considerable effort to sort, manage, and adapt the data as per the tester’s need.
High-quality data is required to create high-quality software with fewer defects, and for that purpose, test data management is the best solution.
These are some of the important software testing artifacts that are prepared for all the software testing projects in general. These documents provide a clear picture of what was tested and also the outcome of the test results.