Estimating the time spent on a project with git-pandas

I stumbled across a conversation recently on the Tech404 slack channel (a pretty good public slack group for Atlanta area software folks) about mostly taxes, but nestled in the middle was this project: git_time_extractor. In the past I’ve noticed a kind of weird concentration of git related open source projects among Atlanta developers, I’m not sure if that says more about Atlanta or git’s abstruseness.

Anyway, git time extractor is one of a few projects that will rip through your commit history in a given repository and piece together an estimate of how much time was spent writing the code behind them. This can be useful for taxes, general time tracking, reporting or just plain old vanity. The 3 projects I know of that do this are:

I though it would make a nice feature for git-pandas, which I’ve written about here a few times before. I just released version 1.0.3 of that library, so you can get the new functionality by installing it with:

pip install -U git-pandas

For a deeper dive into how the algorithm itself works, kimmobrunfeldt really did a great job in his README of git-hours, so check it out here: https://github.com/kimmobrunfeldt/git-hours#how-it-works

For an example of that in git-pandas, we will make a Repository object for the git-pandas repo itself, and calculate the hours spent on it’s python files in the master branch, excluding tests, and assuming 30 minutes or so for a lone commit:

import os
from gitpandas.repository import Repository

# get the path of this repo
path = os.path.abspath('../../git-pandas')

# build an example repository object and try some things out
ignore_dirs = ['tests']
r = Repository(path, verbose=True)

# get the hours estimate for this repository (using 30 mins per commit)
he = r.hours_estimate(
    branch='master',
    grouping_window=0.5,
    single_commit_hours=0.5,
    limit=None,
    extensions=['py'],
    ignore_dir=ignore_dirs
)
print(he)

Which yields two rows (because apparently I can’t spell my name right on one computer):

       committer      hours
0  Will McGinnis  19.454444
1  Will Mcginnis   9.275556

If we were to change our single_commit_hours from 30 minutes to 45 minutes we get:

       committer      hours
0  Will McGinnis  28.768056
1  Will Mcginnis  13.275556

So not quite an exact science, but pretty cool. You can, as always, use this function on a Repository object (which corresponds to a single git repo) or a ProjectDirectory (which corresponds to a collection of repos). The interface between the two is, as always, the exact same. Another potentially neat thing to do would be to run it on your GitHub.com profile and all public repositories in it, which is easy to set up:

g = GitHubProfile(username='wdm0006', ignore_forks=True, verbose=True)
g.hours_estimate(branch='master', by='repository')

For a project directory or github profile, you can chose to aggregate the output dataframe by committer, by repository, or use the default value of None to return the table unmodified.

Check out the source on github:

https://github.com/wdm0006/git-pandas

The post Estimating the time spent on a project with git-pandas appeared first on Will’s Noise.