100% Free Amazon Data-Engineer-Associate Practice Test Questions

Question 1

A team of data analysts frequently conducts identical queries on large datasets stored in Amazon S3, using Athena for analysis. They've noticed that their query results typically remain relevant and unchanged for about 2 hours after execution. To optimize query efficiency and reduce costs, the team seeks a method to avoid redundant data scans for this 2-hour period.

What approach should they adopt to meet this requirement without the need for significant changes to their query infrastructure?

A : Utilize Amazon S3 lifecycle policies to retain and reuse query results for 2 hours.

B : Configure Athena to automatically reuse query results, relying on the default maximum age setting, which is 120 minutes.

C : Implement a custom AWS Lambda function to store and fetch query results for a period of 2 hours.

D : Enable Athena's query result reuse feature and set the maximum age for reusing query results to 120 minutes.

Answer: D

Question 2

A Cloud Data Engineering Team is responsible for maintaining a large-scale application hosted on AWS. The team needs to analyze the application's logs efficiently to monitor performance and identify issues.

These logs are currently stored in JSON format in an Amazon S3 bucket and are frequently updated. The team wants to leverage a service that can query these logs directly from S3 with minimal setup and maintenance, allowing them to quickly derive insights and patterns.

Which AWS service should they choose for this purpose?

A : Configure Amazon EMR with Apache Spark to process and analyze the log data stored in S3.

B : Set up an Amazon OpenSearch Service domain to index the log data, enabling powerful search and analysis capabilities.

C : Use Amazon Athena to directly query the logs in S3 using SQL-like queries without the need for loading data into a separate analytics tool.

D : Deploy a custom log analysis solution on Amazon EC2 instances, utilizing tools like Elasticsearch for querying and analysis.

Answer: C

Question 3

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-nodeAmazon Redshift cluster. The company organizes the data files in the data lake based on the data source ofeach data file.The company loads all the data files into one table in the Redshift cluster by using a separate COPY commandfor each data file location. This approach takes a long time to load all the data files into the table. Thecompany must increase the speed of the data ingestion. The company does not want to increase the cost of theprocess.Which solution will meet these requirements?

A : Use a provisioned Amazon EMR cluster to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.

B : Load all the data files in parallel into Amazon Aurora. Run an AWS Glue job to load the data into Amazon Redshift.

C : Use an AWS Glue job to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.

D : Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.

Answer: D

Question 4

A company stores its processed data in an S3 bucket. The company has a strict data access policy. Thecompany uses IAM roles to grant teams within the company different levels of access to the S3 bucket.The company wants to receive notifications when a user violates the data access policy. Each notificationmust include the username of the user who violated the policy.Which solution will meet these requirements?

A : Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.

B : Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.

C : Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.

D : Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.

Answer: C

Question 5

A Cloud Data Engineering Team is implementing a system for real-time data ingestion through an API. The architecture needs to include data transformation before storage. The system must handle large files and store them efficiently post-transformation. The team is focused on using a serverless architecture on AWS, with an emphasis on Infrastructure as Code (IaC) for standardized and repeatable deployments across various environments.

Which combination of actions should the Cloud Data Engineering Team take to implement IaC for serverless deployments of data ingestion and transformation pipelines? (Select THREE)

A : Set up AWS Data Pipeline in the AWS SAM template for data movement and transformation.

B : Use AWS Elastic MapReduce (EMR) for data transformation in Kinesis Firehose.

C : Define an Amazon API Gateway in the AWS SAM template for data ingestion.

D : Configure DynamoDB in the AWS SAM template for storing the transformed data.

E : Utilize AWS Serverless Application Model (SAM) to declare AWS Lambda functions integrated with Amazon Kinesis Data Firehose.

F : Configure Amazon S3 bucket creation in the AWS SAM template for storing the transformed data.

Answer: C,E,F

Free Practice Amazon Data-Engineer-Associate Exam Questions 2025