From Data Engineering to Prompt Engineering

<p>Data engineering makes up a large part of the data science process. In CRISP-DM this process stage is called &ldquo;data preparation&rdquo;. It comprises tasks such as data ingestion, data transformation and data quality assurance. In our article we solve typical data engineering tasks using ChatGPT and Python. By doing so, we explore the link between data engineering and the new discipline of prompt engineering.</p> <h1>Introduction</h1> <p>In May 2022, Stephen Wolfram and Lex Fridman gave an insightful talk titled &ldquo;<a href="https://www.youtube.com/watch?v=uD353DeOM-4" rel="noopener ugc nofollow" target="_blank">Is programming dead?</a>&rdquo;. They discussed whether high-level languages will still be used by developers in future. According to Wolfram, many programming tasks can be automated with large language models (LLMs). At the time of this writing, the most prominent example of such a model is&nbsp;<a href="https://openai.com/blog/chatgpt" rel="noopener ugc nofollow" target="_blank">ChatGPT</a>. Since its introduction in late 2022, it has generated astonishing results.&nbsp;Specifying an action to be performed by an LLM is referred to as &ldquo;prompt engineering&rdquo;.&nbsp;If Wolfram is right, at least part of software development will shift from writing code to writing prompts.</p> <p>When it comes to data science, data preparation can be a time-consuming and tedious task. So why not try to automate it with an LLM? In the following sections we tackle different data engineering problems with ChatGPT and Python. Instead of writing the Python code ourselves, we used prompt engineering to generate it. Our experiment was conducted on 19 May 2023 based on the latest freely available ChatGPT version (GPT-3.5) at the time.</p> <p><a href="https://towardsdatascience.com/from-data-engineering-to-prompt-engineering-5debd1c636e0">Website</a></p>