Deep dive into pandas Copy-on-Write mode
<p><a href="https://medium.com/gitconnected/welcoming-pandas-2-0-194094e4275b" rel="noopener">pandas 2.0</a> was released in early April and brought many improvements to the new Copy-on-Write (CoW) mode. The feature is expected to become the default in pandas 3.0, which is scheduled for April 2024 at the moment. There are no plans for a legacy or non-CoW mode.</p>
<p>This series of posts will explain how Copy-on-Write works internally to help users understand what’s going on, show how to use it effectively and illustrate how to adapt your code. This will include examples on how to leverage the mechanism to get the most efficient performance and also show a couple of anti-patterns that will result in unnecessary bottlenecks. I wrote a <a href="https://medium.com/towards-data-science/a-solution-for-inconsistencies-in-indexing-operations-in-pandas-b76e10719744" rel="noopener">short introduction</a> to Copy-on-Write a couple of months ago.</p>
<p>I wrote <a href="https://medium.com/better-programming/pandas-internals-explained-545f14a941c1" rel="noopener">a short post</a> that explains the data structure of pandas which will help you understand some terminology that is necessary for CoW.</p>
<p>I am part of the pandas core team and was heavily involved in implementing and improving CoW so far. I am an open source engineer for <a href="https://www.coiled.io/" rel="noopener ugc nofollow" target="_blank">Coiled</a> where I work on Dask, including improving the pandas integration and ensuring that Dask is compliant with CoW.</p>
<p>Unfortunately, this also updated <code>df</code> and not only <code>grades</code>, which has the potential to introduce hard to find bugs. CoW will disallow this behavior and ensures that only <code>df</code> is updated. We also see a false-positive <code>SettingWithCopyWarning</code> that doesn't help us here.</p>
<p><a href="https://towardsdatascience.com/deep-dive-into-pandas-copy-on-write-mode-part-i-26982e7408c6">Visit Now</a></p>