Curious about Actual Databricks Certified Data Engineer Associate Exam Questions?
Here are sample Databricks Certified Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam questions from real exam. You can get more Databricks Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam premium practice questions at TestInsights.
Which of the following commands will return the number of null values in the member_id column?
Correct : C
To return the number of null values in the member_id column, the best option is to use the count_if function, which counts the number of rows that satisfy a given condition. In this case, the condition is that the member_id column is null. The other options are either incorrect or not supported by Spark SQL. Option A will return the number of non-null values in the member_id column. Option B will not work because there is no count_null function in Spark SQL. Option D will not work because there is no null function in Spark SQL. Option E will not work because there is no count_null function in Spark SQL.Reference:
Built-in Functions - Spark SQL, Built-in Functions
count_if - Spark SQL, Built-in Functions
Start a Discussions
Which of the following must be specified when creating a new Delta Live Tables pipeline?
Correct : E
Option E is the correct answer because it is the only mandatory requirement when creating a new Delta Live Tables pipeline. A pipeline is a data processing workflow that contains materialized views and streaming tables declared in Python or SQL source files. Delta Live Tables infers the dependencies between these tables and ensures updates occur in the correct order. To create a pipeline, you need to specify at least one notebook library to be executed, which contains the Delta Live Tables syntax. You can also specify multiple libraries of different languages within your pipeline. The other options are optional or not applicable for creating a pipeline. Option A is not required, but you can optionally provide a key-value pair configuration to customize the pipeline settings, such as the storage location, the target schema, the notifications, and the pipeline mode. Option B is not applicable, as the DBU/hour cost is determined by the cluster configuration, not the pipeline creation. Option C is not required, but you can optionally specify a storage location for the output data from the pipeline. If you leave it empty, the system uses a default location. Option D is not required, but you can optionally specify a location of a target database for the written data, either in the Hive metastore or the Unity Catalog.
Start a Discussions
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.
Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?
Start a Discussions
Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?
Start a Discussions
Which tool is used by Auto Loader to process data incrementally?
Correct : A
Auto Loader in Databricks utilizes Spark Structured Streaming for processing data incrementally. This allows Auto Loader to efficiently ingest streaming or batch data at scale and to recognize new data as it arrives in cloud storage. Spark Structured Streaming provides the underlying engine that supports various incremental data loading capabilities like schema inference and file notification mode, which are crucial for the dynamic nature of data lakes.
Reference: Databricks documentation on Auto Loader: Auto Loader Overview
Start a Discussions
Total 100 questions