Category Encoders v1.2.4 Release

This post was originally published here

I've just cut a fresh release of the scikit-learn-contrib library, category_encoders.  This one included a lot of great contributions from the broader community, which has been really great. A few selected features now available:

  • Leave-one-out encoding: a new encoder, based on a popular Kaggle post by Owen Zhang, detailed here and here. (proposal)
  • Maintenance fixes in upstream libraries (should get fewer pandas warnings, issue)
  • Bugfix for calling fit on the same thing many times (issue)
  • Consistent category ordering (proposal)
  • Consistent output shape for datasets with inconsistent category appearances (issue)
  • Missing value and unknown category handling made consistent across all encoders.

Install or upgrade using the command:

pip install -U category_encoders

All in all a fairly large release by our standards, and there are still some issues open to be worked on. So upgrade, try it out, let me know what you think, and if you'd like to get involved, find us on github here.

Related Posts

Modernizing Pedalwrencher: whatever that means. I've got a side project that I've maintained (badly) for the past couple of years, pedalwrencher.com.  It's a pretty simple idea, if you ride bik...
Using pandas with large data Tips for reducing memory usage by up to 90%When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we m...
Git-pandas caching for faster analysis Git-pandas is a python library I wrote to help make analysis of git data easier when dealing with collections of repositories.  It makes a ton of...
Python Cheat Sheet for Data Science The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax th...

Leave a Reply

Be the First to Comment!

Notify of
avatar
wpDiscuz