Question: 1
A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.
Which of the following suggestions should the team include in their guidelines?
Question: 2
A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:
They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:
Which of the following lines of code can be used to complete the code block to successfully complete the task?
Question: 3
A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:
Which piece of code can be used to fill in the above blank to complete the task?
Question: 4
A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df.
batch_df has the following schema:
customer_id STRING
The machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:
In which situation will the machine learning engineer's code block perform the desired inference?
Question: 5
A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:
Hyperparameter 1: [2, 5, 10]
Hyperparameter 2: [50, 100]
Which of the following represents the number of machine learning models that can be trained in parallel during this process?