How to Design a Roadmap for a Machine Learning Project

<p>What is the first thing you do when starting a new machine learning project?</p> <p>I&rsquo;ve posed this question to a variety of ML leaders in startups and have received a few different answers. In no particular order:</p> <ol> <li>Try out one of our existing models to see if it works for the new task.</li> <li>Start exploring and understanding the data.</li> <li>Dig into the research literature to see what&rsquo;s been done before.</li> </ol> <p>Notice that none of these first steps is to code and train a new model. And none is to design a data preprocessing pipeline.</p> <p>Each of the three approaches has its merits. If the new project is quite similar to something that has previously been modeled (both the data and the task), trying out modeling approaches that have already been implemented can be a very quick way to establish a baseline for the task. In doing so, you may also discover new challenges that must be accommodated in data preprocessing or modeling.</p> <p>This might lead you into #2: exploring and understanding the data. Or you might have started here. Recognizing the unique needs of a new dataset is essential. Perhaps preprocessing or annotation needs to be handled differently. Maybe there are artifacts in the data that need to be cleaned up or the labels aren&rsquo;t always correct. Understanding the challenges that preprocessing and modeling will need to contend with is essential.</p> <p>But the step that some teams miss and is the most critical in setting a project up for success is a literature search. Has someone else modeled a similar task on similar data? If the type of data you&rsquo;re working with is common, then you might be able to apply a very strict definition of &ldquo;similar.&rdquo; But if you&rsquo;re working with a new imaging modality, for example, or tackling a new task, you might need to relax the definition of &ldquo;similar&rdquo; to find relevant research.</p> <p>All three of these first steps are important in the process that I use for planning a new project: a Machine Learning Roadmap.</p> <p>When I work with clients on a new project, the Roadmap is the first step. The Roadmap clarifies the scope of work for the rest of the project. It decreases the uncertainty on what will need to be implemented. It also reduces the likelihood of going in circles or wasting time on unsuccessful approaches. It saves time and money by identifying existing toolkits before implementing something from scratch. And it increases the likelihood of the project&rsquo;s success.</p> <p><a href="https://towardsdatascience.com/how-to-design-a-roadmap-for-a-machine-learning-project-1bbdb88bde48">Read More</a></p>