Git-pandas v1.0.0, or how to check for a stable release

In the process of making the v1.0.0 release of git-pandas, I had one primary goal: to simplify and solidify the interface to git-pandas objects (the ProjectDirectory and the Repository).  At the end of the day, the usefulness of a project like git-pandas versus one off analysis or rolling your own interface is consistent and predictable interfaces to commonly used functions.

So with that in mind, I was interested in the various input parameters for the functions.  What I wanted to avoid was something like:

df = repo.functionA(file_extension='py')
df = repo.functionB(file_ext='py')

So to quickly get an idea of where things stood, I looked to the inspect module in the standard python library.  With this, we can load git-pandas into memory, find all of the classes in it, and get a dictionary of the arguments to each function.  I’ve actually left this script in the git-pandas repo here, if you’d like to use it, but let’s dig into it here:

The first step is to extract the objects (classes and functions are really what we are interested in here) from the module.  In our case, we are just looking at those objects directly importable via “from gitpandas import foo”.

def extract_objects(m, classes=True, functions=False):
    # add in the classes at this level
    out = {}
    if classes:
        m_dict = {k: v for k, v in m.__dict__.items() if inspect.isclass(v)}
        out.update(m_dict)
    if functions:
        m_dict = {k: v for k, v in m.__dict__.items() if inspect.isfunction(v)}
        out.update(m_dict)

    return out

Here we use the dict attribute of the module to iterate through the objects stored in it, checking if they are classes or functions, and shoving them into a dictionary for further analysis.

Next we need to find the arguments for each function, or function in a class:

def get_signatures(m, remove_self=True):
    if remove_self:
        excludes = ['self']
    else:
        excludes = []

    out = {}
    for key in m.keys():
        try:
            for k, v in m[key].__dict__.items():
                try:
                    out[str(key) + '.' + k] = [x for x in list(inspect.getargspec(v).args) if x not in excludes]
                except:
                    pass
        except:
            out[key] = [x for x in list(inspect.getargspec(m[key]).args) if x not in excludes]

    return out

To denote class methods, we use class.method notation.  Optionally, we can exclude the ‘self’ parameter which is convention to use as the instance variable in class methods.

Finally, we can take this dictionary of functions and arguments, and find the unique set of arguments for the module.

def get_distinct_params(m):
    out = set()
    for k in m.keys():
        out.update(m[k])
    return out

So pulling these three together, we can just do:

sigs = get_signatures(extract_objects(module))
print(get_distinct_params(sigs))

And find out that git-pandas has only a handful of possible arguments:

  1. extensions: a list of file extensions to analyze
  2. by: a categorical option for how to aggregate or pivot a dataframe (e.g.: by author or by project)
  3. branch: the git branch to analyze
  4. limit: a max number of rows to return
  5. verbose: whether or not to log out detailed information
  6. filename: a specific file to analyze
  7. committer: a boolean for whether to perform analysis on the committer (as opposed to author)
  8. working_dir: the directory your repository or repositories are in
  9. ignore_dir: a list of directories to ignore in the analysis
  10. coverage: a boolean for whether or not to include  coverage data in a resultset
  11. skip: an integer of rows to skip in the return set (so limit 10, skip 2 would return rows 0, 2, 4, 6, … and 18)
  12. num_datapoints: a total number of datapoints (evenly spaced across the whole dataset) to return
  13. normalize: boolean for whether to return normalized or absolute values
  14. ignore_repos: which repositories to ignore when assembling a ProjectDirectory
  15. days: the number of days of data to return (since now)
  16. rev: the specific revision to analyze

Going forward the aim will continue to be keeping this list short and logical while growing the functionality.

 

So check out the new release on PyPI, or at the source:

https://github.com/wdm0006/git-pandas

The post Git-pandas v1.0.0, or how to check for a stable release appeared first on Will’s Noise.