Databricks Databricks-Machine-Learning-Associate Quiz 1 Topic 2 Questions 1-5

Question: 1

A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.

q1_Databricks-Machine-Learning-Associate

Which of the following suggestions should the team include in their guidelines?

AThe F1 score should be utilized over accuracy when the number of actual positive cases is identical to the number of actual negative cases.

BThe F1 score should be utilized over accuracy when there are greater than two classes in the target variable.

CThe F1 score should be utilized over accuracy when there is significant imbalance between positive and negative classes and avoiding false negatives is a priority.

DThe F1 score should be utilized over accuracy when identifying true positives and true negatives are equally important to the business problem.

Show Answer

Question: 2

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

q2_Databricks-Machine-Learning-Associate

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

q2_Databricks-Machine-Learning-Associate

Which of the following lines of code can be used to complete the code block to successfully complete the task?

Apredict(*spark_df.columns)

BmapInPandas(predict)

Cpredict(Iterator(spark_df))

DmapInPandas(predict(spark_df.columns))

Epredict(spark_df.columns)

Show Answer

Question: 3

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.

They have written the following incomplete code block:

q3_Databricks-Machine-Learning-Associate

Which piece of code can be used to fill in the above blank to complete the task?

AapplyInPandas

BgroupedApplyInPandas

CmapInPandas

Dpredict

Show Answer

Question: 4

A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df.

batch_df has the following schema:

customer_id STRING

The machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:

q4_Databricks-Machine-Learning-Associate

In which situation will the machine learning engineer's code block perform the desired inference?

AWhen the Feature Store feature set was logged with the model at model_uri

BWhen all of the features used by the model at model_uri are in a Spark DataFrame in the PySpark

CWhen the model at model_uri only uses customer_id as a feature

DThis code block will not perform the desired inference in any situation.

EWhen all of the features used by the model at model_uri are in a single Feature Store table

Show Answer

Question: 5

A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:

Hyperparameter 1: [2, 5, 10]

Hyperparameter 2: [50, 100]

Which of the following represents the number of machine learning models that can be trained in parallel during this process?

A3

B5

C6

D18

Show Answer

Databricks Databricks-Machine-Learning-Associate Quiz:1 Topic:2 Questions:1-5