A Solid Foundation for Generative AI | GenAI Blog Series

JoinSeven
4 min readNov 16, 2023

In this blog, we explore the crucial elements for the success of GenAI projects, including the foundations of systems, data, and processes.

Introduction

Welcome back to our series on Generative AI (GenAI). After our introduction to GenAI and selecting a strong use case, we now focus on the second of the six success factors: creating a solid foundation for your GenAI project. Without technical prerequisites in terms of systems, data, and processes, a Generative AI project has little chance of success.

The Iceberg Principle

We use the metaphor of an iceberg to explore the important technical preparations and requirements for a successful GenAI project. What lies beneath the surface — the data, processes, and systems — forms the strength behind your (Gen)AI project​​.

What lies beneath the surface forms the strength behind your GenAI project

Answering Parliamentary Questions Faster, More Accurately, and Consistently

Consider the AI-driven project for support with Parliamentary questions. On the surface, we ask, “How can we answer Parliamentary questions faster, more accurately, and more consistently?”. With policy staff, we devised several core functionalities for this challenge. To realize this, we need to dig deeper to identify all present (and required) components. The data is structured in various ways and stored on different databases, such as the website of the House of Representatives, Official Publications, Government, and websites of local governments and knowledge institutions. The desired functionalities require specific tools and systems to collect, store, search, interpret, process, and enrich data​​.

The Crucial Role of Data

Every GenAI project starts with the essential building block: data. The quality of your results largely depends on the quality of the data. Whether your project is aimed at optimizing customer service, predicting market developments, or answering Parliamentary questions, we generally follow these steps:

  1. Collect Data: From selecting relevant data for your use case, start by collecting raw data. Per data source, determine how to bring in the data, for example, through an API connection with a website, database, or data lake. It’s also important to consider the refresh rate of the data if you want to bring your AI project to production.
  2. Pre-process Data: Next, clean and structure your data. This involves tokenizing, lemmatizing, stemming, and removing superfluous elements from texts. You can also choose to break large pieces of text into chunks and create so-called “embeddings” to make the text more searchable and usable for the language model in a later step.
  3. Make Data Available: Finally, make the data available for different components of your system, such as your search engine or language model​​.
Codi for answering parliamentary questions

Systems and Tools

Developing GenAI requires a strong mix of computing power, specialized infrastructure, and the right tools. For projects focusing on text, such as interpreting and answering Parliamentary questions, there are specific points to consider. You benefit particularly from flexible forms of data storage and processing power to handle the unstructured and dynamic nature of textual information. At JoinSeven, for example, we use data lakes and NoSQL solutions. For making documents searchable, we use search technology based on Apache Lucene​​.

Processes

The use case of your AI projects likely consists of complex workflows and sub-processes. It is important to dissect the process and identify which crucial paths determine the functioning of your future solution. Processes are rarely linear and often have hidden shortcuts and alternative routes. Recognizing and understanding these alternative processes is crucial for the foundation of your AI project. AI can take on various roles within a process, ranging from a fully autonomous assistant making its own decisions to a supporting assistant. It is crucial to determine how AI can best contribute to the process​​.

Conclusion

This blog has taken you through the importance of technical prerequisites for Generative AI projects. We have looked at the crucial components: data, processes, and systems. In the next blog, we will delve into selecting the right AI model that can build on this foundation.

Key points from this blog:

  • Your systems, data, and processes are the foundation for the success of an AI project.
  • Collecting, pre-processing, and making data available are essential steps for realizing your (Gen)AI use case.
  • You also need a combination of computing power, specialized infrastructure, and the right tools.
  • It is important to map the process and the “elephant paths.”
  • You do not have to develop all the technical prerequisites for your Generative AI yourself, as many can be purchased and used under licenses​​.

--

--

JoinSeven

We increase the impact of organizations through the development of intelligent apps with AI, dashboards and other data-driven applications.