Exam 18: Professional Data Engineer on Google Cloud Platform
Exam 1: Google AdWords: Display Advertising122 Questions
Exam 2: Google AdWords Fundamentals153 Questions
Exam 3: Associate Android Developer86 Questions
Exam 4: Associate Cloud Engineer134 Questions
Exam 5: Cloud Digital Leader91 Questions
Exam 6: Google Analytics Individual Qualification (IQ)121 Questions
Exam 7: Google Analytics Individual Qualification78 Questions
Exam 8: GSuite202 Questions
Exam 9: Looker Business Analyst388 Questions
Exam 10: LookML Developer41 Questions
Exam 11: Mobile Web Specialist13 Questions
Exam 12: Professional Cloud Architect on Google Cloud Platform118 Questions
Exam 13: Professional Cloud Developer85 Questions
Exam 14: Professional Cloud DevOps Engineer28 Questions
Exam 15: Professional Cloud Network Engineer57 Questions
Exam 16: Professional Cloud Security Engineer80 Questions
Exam 17: Professional Collaboration Engineer71 Questions
Exam 18: Professional Data Engineer on Google Cloud Platform256 Questions
Exam 19: Professional Machine Learning Engineer35 Questions
Select questions type
Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?
(Multiple Choice)
4.9/5
(34)
You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
(Multiple Choice)
4.9/5
(36)
Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?
(Multiple Choice)
4.8/5
(41)
You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?
(Multiple Choice)
4.8/5
(34)
Which of these are examples of a value in a sparse vector? (Select 2 answers.)
(Multiple Choice)
4.7/5
(31)
Flowlogistic Case Study Company Overview Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping. Company Background The company started as a regional trucking company, and then expanded into other logistics market. Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources. Solution Concept Flowlogistic wants to implement two concepts using the cloud: Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed. Existing Technical Environment Flowlogistic architecture resides in a single data center: Databases - 8 physical servers in 2 clusters - SQL Server - user data, inventory, static data - 3 physical servers - Cassandra - metadata, tracking messages 10 Kafka servers - tracking message aggregation and batch insert Application servers - customer front end, middleware for order/customs - 60 virtual machines across 20 physical servers - Tomcat - Java services - Nginx - static content - Batch servers Storage appliances - iSCSI for virtual machine (VM) hosts - Fibre Channel storage area network (FC SAN) - SQL server storage Network-attached storage (NAS) image storage, logs, backups 10 Apache Hadoop /Spark servers - Core Data Lake - Data analysis workloads 20 miscellaneous servers - Jenkins, monitoring, bastion hosts, Business Requirements Build a reliable and reproducible environment with scaled panty of production. Aggregate data in a centralized Data Lake for analysis Use historical data to perform predictive analytics on future shipments Accurately track every shipment worldwide using proprietary technology Improve business agility and speed of innovation through rapid provisioning of new resources Analyze and optimize architecture for performance in the cloud Migrate fully to the cloud if all other requirements are met Technical Requirements Handle both streaming and batch data Migrate existing Hadoop workloads Ensure architecture is scalable and elastic to meet the changing demands of the company. Use managed services whenever possible Encrypt data flight and at rest Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around. We need to organize our information so we can more easily understand where our customers are and what they are shipping. CTO Statement IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology. CFO Statement Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to building out a server environment. Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?
(Multiple Choice)
4.9/5
(31)
You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern. Which service do you select for storing and serving your data?
(Multiple Choice)
4.8/5
(29)
You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?
(Multiple Choice)
4.8/5
(37)
Which Java SDK class can you use to run your Dataflow programs locally?
(Multiple Choice)
4.8/5
(30)
An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?
(Multiple Choice)
4.8/5
(36)
Which of the following are feature engineering techniques? (Select 2 answers)
(Multiple Choice)
4.9/5
(34)
MJTelco Case Study Company Overview MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware. Company Background Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs. Solution Concept MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs: Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations. Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition. MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers. Business Requirements Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis. Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers. Technical Requirements Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each. Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles. CEO Statement Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments. CTO Statement Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate. CFO Statement The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines. MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?
(Multiple Choice)
4.9/5
(46)
Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?
(Multiple Choice)
4.9/5
(33)
What are two of the characteristics of using online prediction rather than batch prediction?
(Multiple Choice)
4.9/5
(39)
Which Google Cloud Platform service is an alternative to Hadoop with Hive?
(Multiple Choice)
4.8/5
(30)
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?
(Multiple Choice)
4.8/5
(36)
You work for a large fast food restaurant chain with over 400,000 employees. You store employee information in Google BigQuery in a Users table consisting of a FirstName field and a LastName field. A member of IT is building an application and asks you to modify the schema and data in BigQuery so the application can query a FullName field consisting of the value of the field concatenated with a space, followed by the value of the field for each employee. How can you make that data available while minimizing cost?
(Multiple Choice)
4.8/5
(40)
Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?
(Multiple Choice)
4.8/5
(40)
Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?
(Multiple Choice)
4.8/5
(36)
Showing 161 - 180 of 256
Filters
- Essay(0)
- Multiple Choice(0)
- Short Answer(0)
- True False(0)
- Matching(0)