Multiple Choice
You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE) . Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
A) Significantly increase the max_batch_size TensorFlow Serving parameter. Significantly increase the max_batch_size TensorFlow Serving parameter.
B) Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C) Significantly increase the max_enqueued_batches TensorFlow Serving parameter. max_enqueued_batches
D) Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.
Correct Answer:

Verified
Correct Answer:
Verified
Q9: You are training a Resnet model on
Q10: Your team needs to build a model
Q11: You recently joined a machine learning team
Q12: You are training a TensorFlow model on
Q13: You are an ML engineer at a
Q15: You are building an ML model to
Q16: You have written unit tests for a
Q17: Your team trained and tested a DNN
Q18: Your organization's call center has asked you
Q19: You built and manage a production system