psutil 5.3.0 with full Unicode support is out

This post was originally published here

psutil 5.3.0 is finally out. This release is a major one, as it includes tons of improvements and bugfixes, probably like no other previous release. It is interesting to notice how huge the diff between 5.2.2 and 5.3.0 is. This is due to the fact that I've been travelling quite a lot this year, so I kept postponing it. It may sound weird but I consider publishing a new release and write a blog post about more stressful than working on the release itself. =). Anyway, here goes.

Full Unicode support

This is the biggest change. In order to achieve this I had to refactor all functions and internals either returning or accepting a string. Incidentally this helped me having a better understanding of how Unicode works and how it should be handled at the C level in terms of differences between Python 2 and 3. Issue #1040 includes all the reasonings I've been through and potentially serves as a documentation for people who are facing a similar task (handling Unicode in C for both Python 2 and 3). Up until version 5.2.x psutil functions returning a string had different problems as they could:
  • raise decoding error on Python 3 in case of non-ASCII string
  • return unicode instead of str (Python 2)
  • return incorrect / invalid encoded data in case of non-ASCII string

5.3.0 fixes these three issues and consolidates the correct handling of Unicode strings. On Windows this was achieved by using Unicode-specific Windows APIs. The notes below describe how Unicode and strings in general are handled internally by psutil and they apply to any API returning a string such as Process.exe or Process.cwd, including non-filesystem related methods such as Process.username or WindowsService.description:

  • all strings are encoded by using the OS filesystem encoding (sys.getfilesystemencoding()) which varies depending on the platform (e.g. “UTF-8” on OSX, “mbcs” on Win)
  • no API call is supposed to crash with UnicodeDecodeError
  • instead, in case of badly encoded data returned by the OS, the following error handlers are used to replace the corrupted characters in the string:
  • on Python 2 all APIs return bytes (str type), never unicode
  • on Python 2 you can go back to unicode by doing:
>>> unicode(proc.exe(), sys.getdefaultencoding(), errors="replace")

Improved process_iter() function

process_iter() accepts two new parameters in order to invoke Process.as_dict() internally: “attrs” and “ad_value”. With this you can iterate over all processes in one shot without having to catch NoSuchProcess explicitly. Before:

>>> import psutil
>>> for proc in psutil.process_iter():
...     try:
...         pinfo = proc.as_dict(attrs=['pid', 'name'])
...     except psutil.NoSuchProcess:
...         pass
...     else:
...         print(pinfo)
...
{'pid': 1, 'name': 'systemd'}
{'pid': 2, 'name': 'kthreadd'}
{'pid': 3, 'name': 'ksoftirqd/0'}
...

Now:

>>> import psutil
>>> for proc in psutil.process_iter(attrs=['pid', 'name']):
...     print(proc.info)
...
{'pid': 1, 'name': 'systemd'}
{'pid': 2, 'name': 'kthreadd'}
{'pid': 3, 'name': 'ksoftirqd/0'}

This improves expressiveness as it makes it possible to use nice list/dict comprehensions. Here's some examples.

Processes having “python” in their name::

>>> from pprint import pprint as pp
>>> pp([p.info for p in psutil.process_iter(attrs=['pid', 'name']) if 'python' in p.info['name']])
[{'name': 'python3', 'pid': 21947},
{'name': 'python', 'pid': 23835}]

Processes owned by user::

>>> import getpass
>>> pp([(p.pid, p.info['name']) for p in psutil.process_iter(attrs=['name', 'username']) if p.info['username'] == getpass.getuser()])
(16832, 'bash'),
(19772, 'ssh'),
(20492, 'python')]

Processes actively running::

>>> pp([(p.pid, p.info) for p in psutil.process_iter(attrs=['name', 'status']) if p.info['status'] == psutil.STATUS_RUNNING])
[(1150, {'name': 'Xorg', 'status': 'running'}),
(1776, {'name': 'unity-panel-service', 'status': 'running'}),
(20492, {'name': 'python', 'status': 'running'})]

Automatic overflow handling of numbers

On very busy or long-lived system systems numbers returned by disk_io_counters() and net_io_counters() functions may wrap (restart from zero). Up to version 5.2.x you had to take this into account while now this is automatically handled by psutil (see: #802). If a “counter” restarts from 0 psutil will add the value from the previous call for you so that numbers will never decrease. This is crucial for applications monitoring disk or network I/O in real time. Old behavior can be resumed by passing nowrap=True argument.

SunOS Process environ()

Process.environ() is now available also on SunOS (see #1091).

Other improvements and bug fixes

Amongst others, here's a couple of important bug fixes I'd like to mention:

  • #1044: on OSX different Process methods could incorrectly raise AccessDenied for zombie processes. This was due to poor proc_pidpath OSX API.
  • #1094: on Windows, pid_exists() may lie due to the poor OpenProcess Windows API which can return a handle even when a process PID no longer exists. This had repercussions for many Process methods such as cmdline(), environ(), cwd(), connections() and others which could have unpredictable behaviors such as returning empty data or erroneously raise NoSuchProcess exceptions. For the same reason (broken OpenProcess API), processes could unexpectedly stick around after being terminate()d and wait()ed on.

BSD systems also received some love (NetBSD and OpenBSD in particular). Different memory leaks were fixed and functions returning connected sockets were partially rewritten. The full list of enhancement and bug fixes can be seen here.

About me

I would like to spend a couple more words about my current situation. Last year (2016) I relocated to Prague and remote worked from there the whole year (it's been cool – great city!). This year I have mainly been resting in Turin (Italy) due to some health issues and travelling across Asia once I started to recover. I am currently in Shenzhen, China, and unless the current situation with North Korea gets worse I'm planning to continue my trip until November and visit Taiwan, South Korea and Japan. Once I'm finished the plan is to briefly return to Turin (Italy) and finally return to Prague. By then I will probably be looking for a new (remote) gig again, so if you have anything for me by November feel free to send me a message. 😉

Related Posts

Coding in Interactive Mode vs Script Mode When programming in Python, you have two basic options for running code: interactive mode and script mode. Distinguishing between these modes can be s...
How to Consolidate Multiple Django Projects Dos and Don’ts For Success If you’ve been developing web applications for your company or a client for a few years, it’s possib...
Text Analytics and Visualization For this post, I want to describe a text analytics and visualization technique using a basic keyword extraction mechanism using nothing but a word cou...
SQL Fundamentals The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works we...