Home
Google
Professional Data Engineer Exam Info
Professional Data Engineer Exam Questions

Curious about Actual Google Cloud Certified Professional Data Engineer Exam Questions?

Here are sample Google Cloud Certified Professional Data Engineer (Professional Data Engineer) Exam questions from real exam. You can get more Google Cloud Certified (Professional Data Engineer) Exam premium practice questions at TestInsights.

Page: 1 /
Total 375 questions

Want more questions? Get Premium Access.

Question 1

You are designing the architecture of your application to store data in Cloud Storage. Your application consists of pipelines that read data from a Cloud Storage bucket that contains raw data, and write the data to a second bucket after processing. You want to design an architecture with Cloud Storage resources that are capable of being resilient if a Google Cloud regional failure occurs. You want to minimize the recovery point objective (RPO) if a failure occurs, with no impact on applications that use the stored dat

a. What should you do?

AAdopt two regional Cloud Storage buckets, and update your application to write the output on both buckets.

BAdopt multi-regional Cloud Storage buckets in your architecture.

CAdopt two regional Cloud Storage buckets, and create a daily task to copy from one bucket to the other.

DAdopt a dual-region Cloud Storage bucket, and enable turbo replication in your architecture.

Correct : D

To ensure resilience and minimize the recovery point objective (RPO) with no impact on applications, using a dual-region bucket with turbo replication is the best approach. Here's why option D is the best choice:

Dual-Region Buckets:

Dual-region buckets store data redundantly across two distinct geographic regions, providing high availability and durability.

This setup ensures that data remains available even if one region experiences a failure.

Turbo Replication:

Turbo replication ensures that data is replicated between the two regions within 15 minutes, aligning with the requirement to minimize the recovery point objective (RPO).

This feature provides near real-time replication, significantly reducing the risk of data loss.

No Impact on Applications:

Applications continue to access the dual-region bucket without any changes, ensuring seamless operation even during a regional failure.

The dual-region setup transparently handles failover, providing uninterrupted access to data.

Steps to Implement:

Create a Dual-Region Bucket:

Create a dual-region Cloud Storage bucket in the Google Cloud Console, selecting appropriate regions (e.g., us-central1 and us-east1).

Enable Turbo Replication:

Enable turbo replication to ensure rapid data replication between the selected regions.

Configure Applications:

Ensure that applications read and write to the dual-region bucket, benefiting from its high availability and durability.

Test Failover:

Simulate a regional failure to verify that the dual-region bucket and turbo replication meet the required RPO and ensure data resilience.

Google Cloud Storage Dual-Region

Turbo Replication in Google Cloud Storage

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AAdopt two regional Cloud Storage buckets, and update your application to write the output on both buckets.

BAdopt multi-regional Cloud Storage buckets in your architecture.

CAdopt two regional Cloud Storage buckets, and create a daily task to copy from one bucket to the other.

DAdopt a dual-region Cloud Storage bucket, and enable turbo replication in your architecture.

0 / 1500

Question 2

You are using Workflows to call an API that returns a 1 KB JSON response, apply some complex business logic on this response, wait for the logic to complete, and then perform a load from a Cloud Storage file to BigQuery. The Workflows standard library does not have sufficient capabilities to perform your complex logic, and you want to use Python's standard library instead. You want to optimize your workflow for simplicity and speed of execution. What should you do?

AInvoke a Cloud Function instance that uses Python to apply the logic on your JSON file.

BInvoke a subworkflow in Workflows to apply the logic on your JSON file.

CCreate a Cloud Composer environment and run the logic in Cloud Composer.

DCreate a Dataproc cluster, and use PySpark to apply the logic on your JSON file.

Correct : A

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AInvoke a Cloud Function instance that uses Python to apply the logic on your JSON file.

BInvoke a subworkflow in Workflows to apply the logic on your JSON file.

CCreate a Cloud Composer environment and run the logic in Cloud Composer.

DCreate a Dataproc cluster, and use PySpark to apply the logic on your JSON file.

0 / 1500

Question 3

You are using BigQuery with a regional dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?

ASchedule a daily BigQuery snapshot of the table.

BSchedule a daily export of the table to a Cloud Storage dual or multi-region bucket.

CSchedule a daily copy of the dataset to a backup region.

DModify ETL job to load the data into both the current and another backup region.

Correct : A

To apply complex business logic on a JSON response using Python's standard library within a Workflow, invoking a Cloud Function is the most efficient and straightforward approach. Here's why option A is the best choice:

Cloud Functions:

Cloud Functions provide a lightweight, serverless execution environment for running code in response to events. They support Python and can easily integrate with Workflows.

This approach ensures simplicity and speed of execution, as Cloud Functions can be invoked directly from a Workflow and handle the complex logic required.

Flexibility and Simplicity:

Using Cloud Functions allows you to leverage Python's extensive standard library and ecosystem, making it easier to implement and maintain the complex business logic.

Cloud Functions abstract the underlying infrastructure, allowing you to focus on the application logic without worrying about server management.

Performance:

Cloud Functions are optimized for fast execution and can handle the processing of the JSON response efficiently.

They are designed to scale automatically based on demand, ensuring that your workflow remains performant.

Steps to Implement:

Write the Cloud Function:

Develop a Cloud Function in Python that processes the JSON response and applies the necessary business logic.

Deploy the function to Google Cloud.

Invoke Cloud Function from Workflow:

Modify your Workflow to call the Cloud Function using an HTTP request or Google Cloud Function connector.

steps:

- callCloudFunction:

call: http.post

args:

url: https://REGION-PROJECT_ID.cloudfunctions.net/FUNCTION_NAME

body:

key: value

Process Results:

Handle the response from the Cloud Function and proceed with the next steps in the Workflow, such as loading data into BigQuery.

Google Cloud Functions Documentation

Using Workflows with Cloud Functions

Workflows Standard Library

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ASchedule a daily BigQuery snapshot of the table.

BSchedule a daily export of the table to a Cloud Storage dual or multi-region bucket.

CSchedule a daily copy of the dataset to a backup region.

DModify ETL job to load the data into both the current and another backup region.

0 / 1500

Question 4

You have two projects where you run BigQuery jobs:

* One project runs production jobs that have strict completion time SLAs. These are high priority jobs that must have the required compute resources available when needed. These jobs generally never go below a 300 slot utilization, but occasionally spike up an additional 500 slots.

* The other project is for users to run ad-hoc analytical queries. This project generally never uses more than 200 slots at a time. You want these ad-hoc queries to be billed based on how much data users scan rather than by slot capacity.

You need to ensure that both projects have the appropriate compute resources available. What should you do?

ACreate a single Enterprise Edition reservation for both projects. Set a baseline of 300 slots. Enable autoscaling up to 700 slots.

BCreate two reservations, one for each of the projects. For the SLA project, use an Enterprise Edition with a baseline of 300 slots and enable autoscaling up to 500 slots. For the ad-hoc project, configure on-demand billing.

CCreate two Enterprise Edition reservations, one for each of the projects. For the SLA project, set a baseline of 300 slots and enable
autoscaling up to 500 slots. For the ad-hoc project, set a reservation baseline of 0 slots and set the ignore_idle_slot3 flag to False.

DCreate two Enterprise Edition reservations, one for each of the projects. For the SLA project, set a baseline of 800 slots. For the ad-hoc
project, enable autoscaling up to 200 slots.

Correct : B

To ensure that both production jobs with strict SLAs and ad-hoc queries have appropriate compute resources available while adhering to cost efficiency, setting up separate reservations and billing models for each project is the best approach. Here's why option B is the best choice:

Separate Reservations for SLA and Ad-hoc Projects:

Creating two separate reservations allows for dedicated resource management tailored to the needs of each project.

The production project requires guaranteed slots with the ability to scale up as needed, while the ad-hoc project benefits from on-demand billing based on data scanned.

Enterprise Edition Reservation for SLA Project:

Setting a baseline of 300 slots ensures that the SLA project has the minimum required resources.

Enabling autoscaling up to 500 additional slots allows the project to handle occasional spikes in workload without compromising on SLAs.

On-Demand Billing for Ad-hoc Project:

Using on-demand billing for the ad-hoc project ensures cost efficiency, as users are billed based on the amount of data scanned rather than reserved slot capacity.

This model suits the less predictable and often lower-utilization nature of ad-hoc queries.

Steps to Implement:

Set Up Enterprise Edition Reservation for SLA Project:

Create a reservation with a baseline of 300 slots.

Enable autoscaling to allow up to an additional 500 slots as needed.

Configure On-Demand Billing for Ad-hoc Project:

Ensure that the ad-hoc project is set up to use on-demand billing, which charges based on data scanned by the queries.

Monitor and Adjust:

Continuously monitor the usage and performance of both projects to ensure that the configurations meet the needs and make adjustments as necessary.

BigQuery Slot Reservations

BigQuery On-Demand Pricing

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACreate a single Enterprise Edition reservation for both projects. Set a baseline of 300 slots. Enable autoscaling up to 700 slots.

DCreate two Enterprise Edition reservations, one for each of the projects. For the SLA project, set a baseline of 800 slots. For the ad-hoc
project, enable autoscaling up to 200 slots.

0 / 1500

Question 5

You are a BigQuery admin supporting a team of data consumers who run ad hoc queries and downstream reporting in tools such as Looker. All data and users are combined under a single organizational project. You recently noticed some slowness in query results and want to troubleshoot where the slowdowns are occurring. You think that there might be some job queuing or slot contention occurring as users run jobs, which slows down access to results. You need to investigate the query job information and determine where performance is being affected. What should you do?

AUse Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

BUse slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

CUse Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

DUse available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.

Correct : D

To troubleshoot query performance issues related to job queuing or slot contention in BigQuery, using administrative resource charts along with querying the INFORMATION_SCHEMA is the best approach. Here's why option D is the best choice:

Administrative Resource Charts:

BigQuery provides detailed resource charts that show slot usage and job performance over time. These charts help identify patterns of slot contention and peak usage times.

INFORMATION_SCHEMA Queries:

The INFORMATION_SCHEMA tables in BigQuery provide detailed metadata about query jobs, including execution times, slots consumed, and other performance metrics.

Running queries on INFORMATION_SCHEMA allows you to pinpoint specific jobs causing contention and analyze their performance characteristics.

Comprehensive Analysis:

Combining administrative resource charts with detailed queries on INFORMATION_SCHEMA provides a holistic view of the system's performance.

This approach enables you to identify and address the root causes of performance issues, whether they are due to slot contention, inefficient queries, or other factors.

Steps to Implement:

Access Administrative Resource Charts:

Use the Google Cloud Console to view BigQuery's administrative resource charts. These charts provide insights into slot utilization and job performance metrics over time.

Run INFORMATION_SCHEMA Queries:

Execute queries on BigQuery's INFORMATION_SCHEMA to gather detailed information about job performance. For example:

SELECT

creation_time,

job_id,

user_email,

query,

total_slot_ms / 1000 AS slot_seconds,

total_bytes_processed / (1024 * 1024 * 1024) AS processed_gb,

total_bytes_billed / (1024 * 1024 * 1024) AS billed_gb

FROM

`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE

creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)

AND state = 'DONE'

ORDER BY

slot_seconds DESC

LIMIT 100;

Analyze and Optimize:

Use the information gathered to identify bottlenecks, optimize queries, and adjust resource allocations as needed to improve performance.

Monitoring BigQuery Slots

BigQuery INFORMATION_SCHEMA

BigQuery Performance Best Practices

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AUse Cloud Monitoring to view BigQuery metrics and set up alerts that let you know when a certain percentage of slots were used.

BUse slot reservations for your project to ensure that you have enough query processing capacity and are able to allocate available slots to the slower queries.

CUse Cloud Logging to determine if any users or downstream consumers are changing or deleting access grants on tagged resources.

DUse available administrative resource charts to determine how slots are being used and how jobs are performing over time. Run a query on the INFORMATION_SCHEMA to review query performance.

0 / 1500

Page: 1 / 75
Total 375 questions

Unlock Full
Professional Data Engineer Exam Features

In Just $49 You can Access

All Official Question Types
Interactive Web-Based Practice Test Software
No Installation or 3rd Party Software Required
Customize your practice sessions (Free Demo)
24/7 Customer Support

Get Full Access Now

Marked Questions
Professional Data Engineer Exam

Professional-Data-Engineer Exam Question 1
Professional-Data-Engineer Exam Question 2
Professional-Data-Engineer Exam Question 3
Professional-Data-Engineer Exam Question 4
Professional-Data-Engineer Exam Question 5

Download PDF File Demo

Try Web-Based Exam Practice Software Demo

Commenting

In order to participate in the comments you need to be logged-in.
You can sign-up or login