From zero to hero with Google Cloud Platform at Recordly
When Marko joined our crew, he had spent a few years working on very similar projects. He felt like his learning curve was flattening out. He wanted to change that.
Written by — Marko Laitinen, Data Engineer
When I joined Recordly, I had been working with the Azure platform for 3,5 years. This is quite usual in the consulting business where, once you learn a specific tech stack, you are often allocated to similar projects. Personally, however, I experience it as a limitation; I do not want my stack to turn into a one-trick pony in an environment where constant learning is essential.
There is no one right way to do consulting, it always depends on finding the right match. I'm here to share my experiences so that you can have a glimpse of what it is like in a smaller company that emphasizes wide expertise and how this enables me to keep learning while creating awesome stuff.
During the application process for Recordly, I expressed my interest in working with Google Cloud Platform (GCP) and learning its quirks. When I actually started, I immediately got to jump on board a new project with GCP. I was thrilled. Unfortunately, GCP suddenly seemed somewhat off-putting. I’m a visual person and the appearance of code, websites, and services is essential to me. The plain and simple approach of GCP UI didn’t appeal to me at first. I preferred Azure's way of having all different distinct icons for each service and all the resources grouped. Fortunately, the lack of bling on the visual front was soon forgotten once I got familiar with the GCP offering.
I am a data engineer with an interest in technical details and an appreciation of repeatable workflows. As such I like to utilize IaC and CI/CD pipelines whenever possible and the GCP's approach of providing an API for everything really hit home. The whole GCP ecosystem feels nimble and you can build anything on top of the service APIs if you choose to do so.
As an example, my first project with GCP revolved around a few very basic data engineering skills:
- Database model design
- ETL/ELT run orchestrator implementation
- Event-based pipeline design
- IaC implementation
The work had already started when I on-boarded but I still got my fair share of exploring the technical aspects and designing the solution. I figured out how Pub/Subs were leveraged in event-based workflows, applied Terraform as IaC solution for the whole infrastructure, and implemented Google Cloud Composer & Airflow to serve as the orchestrator. I learned about the customer’s business to be able to design and build the data model and reports to help the business users in the best way possible. It was exciting to work together with the customer in order to find ways of making tech and data work for the benefit of their business. I still find that invigorating.
Looking at the purely technical aspects of my first projects, my previous experience with Snowflake made the transition to work with BigQuery relatively easy. The basic concept is pretty much the same: computational and storage layers are separated and you pay for what you consume in each. Naturally, as with any new database service, I had to learn the details that make up the database: BQ partitioning and optimization practices, how arrays and structs work, and how to utilize table snapshots as a few examples.
Being familiar with Airflow helped me start implementing it as ETL/ELT pipeline orchestrator. A big portion of the previous Airflow environments I had used were pre-configured for me but this time, I got to start from scratch. Also, my past experience with Python was paramount for the ability to start working with Airflow since it's built with Python. Gaining experience with the GCC and learning the details of setting up and configuring Airflow environments has improved my understanding of ETL/ELT pipeline orchestration as a whole. It has also taught me where Airflow really shouldn't be applied.
After working with GCP for six months, I decided to take the leap and apply for Google's professional data engineer certification. Even though the recommendation of experience with GCP was higher than what I had, I was convinced that with some studying, I would pass the test. In retrospect, I would recommend studying for the professional data engineer exam even without the intention of applying for certification. It allows you to widen your knowledge of core data services in GCP.
Lately, I’ve been really enjoying working with Vertex AI & MLOps. I’m not a data scientist myself, so I can’t really contribute to the actual work of building machine learning models, but I can help by implementing workflows and automation to manage the machine learning models’ lifecycle. Creating MLOps pipelines and working with new technologies like Kubeflow has been exceptionally fun and educational. And the best part of this all? We enable faster and more accurate predictions for our customer by modernizing the ML platform and development practices around machine learning model lifecycle management. In other words, our work is actually making a difference.
Here are some tips on how to get started with Google:
- Be eager to learn. You won’t learn everything at once but sure enough, if you put in the effort, you’ll get better.
- Try everything and explore how the services work. Google offers a free 90-day trial period with a credit of $300 for new customers. With $300 you can try out a lot of different services and workflows. If you use and verify your business email address, you’ll receive additional credits.
- If you’ve worked with other cloud vendors before, use your past experience to identify similarities between the different services of different cloud providers. The high-level architecture design concepts are applicable regardless of the cloud vendor.
Did Marko's story and experiences resonate? We would be happy to hear from you 🖤