Collaborating with ChatGPT to Generate a Data Warehouse

One of the things we specialize in is data warehouses, so why not see how AI can help or even replace us? The answer may surprise you: It's not about magic leading to job theft but rather forming a fellowship with AI.

Written by — Lauri Lehtovaara, ML Architect

ChatGPT, Copilot, generative AI, autonomous AI agents, AutoGPT, BabyAGI, … My social media feeds are full of generative AI posts each showing off more amazing demos than the previous one. This is putting some heavy stress on my hype hogwash filter. Where is the business value? Is this relevant to us? Or just for the next unicorn startup? Maybe this is just amazing but useless beyond entertainment purposes?

1990s “Artificial neurons are useless because they don’t support exclusive or.”

2020s “Large language models are going to replace software developers.”

As a person who has been following the AI scene for more than 20 years, I have two main questions. What is actually new? What new applications are viable now? As a consultant, I have just one: Where is the business value?

Before moving forward, just have a look at generative AI before GPT:

vae-numbers

Hand-written numbers generated by a variational autoencoder from 2018. Yeah, it used to be mindblowing back then. (Image by author)

So, hype or not? Hype and not.

Generative AI is an amazing tool, but mainly for generative applications like coding tirelessly 24/7, answering questions about EU legal documents in the blink of an eye, and summarizing one million customer feedbacks during the coffee break. It is not suited for dynamic pricing of ads, demand prediction of dog food, or predicting optimal maintenance interval for wind turbines - all of which bring direct business value.

Where is the business value? McKinsey & Co predicts remarkable business potential for generative AI. Large language models capable of generating, interpreting, and refactoring code will heavily impact software engineering in the near future. This prediction is supported by a study from GitHub, Microsoft, and MIT. The study claims that GitHub’s Copilot AI pair programmer significantly improves the efficiency of software developers on average: “The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group.” Another study by GitHub says that most developers feel more productive, less frustrated, and can focus on more satisfying work.

Pretty impressive! That’s the whole point of AI, isn’t it? Empowering humans to be more productive while increasing their well-being. Well, not the whole point, but pretty high on the list if not on top.

And... I must agree with the studies. Once you get used to an AI pair programmer, like Copilot or Duet, coding without one feels just frustrating!

What makes the difference?

We code a lot, therefore, generative AI is relevant to me, Recordly, and to our clients - at least, by us being more efficient. But how far can we go in practice? How much business value could we get out of code generation? Can we go beyond Copilot's code completion and chat?

Recordly excels at building data warehouses. How could AI help us with that? Or even replace us?

"Gen AI will replace everyone!"

"Gen AI is just hallucinating and plagiarizing!"

We could just try to guess who is right... Or we could get our hands dirty and acquire some first-hand experience. The choice is obvious. Recordly is a down-to-earth business data company and we are not afraid to get our hands dirty. We actually enjoy it. So, let's try to generate a minimal data warehouse with generative AI – a large language model (LLM) called ChatGPT. Might have heard of it, haven't you?

If you want the dirty details with example prompts and responses, check out “Generating a data warehouse in collaboration with ChatGPT - tech demo”. If you are more business-orientated, here is a brief summary.

We give an initial task for the AI:

We ask the AI to take a role: an expert software and data engineer.
We give some context: a minimal data warehouse for an instant messaging platform.
We describe the existing operational database.
We describe the business need: a report with the total amount of messages in each team/group per week.
We ask the AI to generate a step-by-step plan.

And then what happens next is

AI generates a step-by-step plan for the implementation.
AI follows the AI-generated plan one step at a time designing schemas and generating code for storing, transforming, and querying the data.
AI analyzes the AI-generated code finding inefficiencies and suggesting improvements.
AI improves the AI-generated code based on the AI-generated analysis.
AI adapts the code to a new business rule by generating a plan for the required changes and then refactoring the code according to the plan.

Please note that AI did not act alone! I did not implement an autonomous AI agent. I always prefer that AI systems are designed to keep a human in the loop in one way or the other. The bulk of the work was done by AI, but I was guiding it gently.

“Ok, here is your task. Could you make a plan on how to proceed?”

“Your plan looks good. Just proceed to implement the first step.”

“Did you remember to analyze its performance?”

“Ok, that’s definitely a good improvement. Go ahead.”

“Oh, sorry. I forgot to tell you that chats might be added any time. Even in the middle of the week. Could you fix this, please?”

Teamwork, u know?

I would call this experiment a success. Instead of typing the code, I was guiding, reviewing, and suggesting. Coding is fun, no mistake! But repetitive tasks and typing trivial code are just boring. The more of those I can delegate to an AI, the happier I am.

At this point, you probably think that the result is pretty impressive or that it does not prove anything as it is not a real use case... or both like I do.

Out of business? Obsolete, thanks to AI?

Is this the end? Are we out of business? Are software engineers, data engineers, and machine learning engineers obsolete? Is AI taking our jobs?

Unlikely, but why? Bottlenecks!

Think of a human team. Why do we need teams? Because humans are not perfect, right? We each have our own strengths and weaknesses. Teams are – or at least should be – built in a way where we complement each other. How do generative AI and large language models (LLMs) have anything to do with teams? From my perspective, a useful thought experiment is to think of AIs as virtual team members – or at least as remote contractors.

Generative AIs definitely have strengths or even superpowers, such as coding tirelessly 24/7, information retrieval for EU legal documents in the blink of an eye, or summarizing one million customer feedbacks during a coffee break. Weaknesses include limited ways of communication, the lack of initiative, or the inability to facilitate a workshop. This might change in the near future, but the point is... Communication is limited, but is possible. Many people need a lot of guidance to get anything done, especially, if facilitating a workshop for the first time, but they can still be immensely helpful. We have overcome these challenges with humans so why not with AIs.

Let’s assume that we have sorted out the main issues and the best generative AI has just joined our team. How much business value is it going to generate?

The thought experiment of thinking of AI as a virtual team member will give you an idea of how to get business value from it and how not. And most importantly, where are the bottlenecks? Bottlenecks, what bottlenecks? Even if LLM could write the perfect code in the blink of an eye, would it be able to plan the process? If so, would it be possible to find out the requirements? And so on. There is always a bottleneck! The question is how far can we push and how much business value we can get.

Rethink your business processes from the perspective of AI's bottlenecks, or let us help.

The result? We, the human team members, will be empowered and allowed to focus on higher-level tasks. But if and only if we are up to the task! Here life-long learning kicks in. Help your team to thrive by providing opportunities for self-development and training. Let them learn these amazing new tools.

Let AI empower us to be more productive while increasing our well-being and trust in the future without the fear of uncertainty.

Ok, let's wrap up.

How to gain business value from generative AI or AI, in general, is way too broad a topic for a single blog post. We only scratched the surface of generative AI in one single use case – software engineering. Other applications of generative AI – like smart information retrieval, how it will change the way we interact with data, and how we can use in-context learning to push the limits of code generation – need to be postponed to another blog post.

Stay tuned or if you can’t wait, reach out to us! 👇