Getting the ‘next’ row of data in a pandas dataframe

This post was originally published here

I’m currently working with stock market trade data that is output from a backtesting engine (I’m working with backtrader currently) in a pandas dataframe.  The format of the ‘transcations’ data that is provided out of the backtesting engine is shown below.

amount price value
date
2016-01-07 00:00:00+00:00 79.017119 195.33 -15434.413883
2016-09-07 00:00:00+00:00 -79.017119 218.84 17292.106354
2016-09-20 00:00:00+00:00 82.217609 214.41 -17628.277649
2016-11-16 00:00:00+00:00 -82.217609 217.56 17887.263119

The data provided gives four crucial pieces of information:

  • Date – The date of a transaction.
  • Amount – the number of shares purchased (positive number) or sold (negative number)  during the transaction.
  • Price – the price received or paid at the time of the sale.
  • Value – the cash value of the transaction.

Backtrader’s transactions dataframe is comprised of two rows make for one one transaction (the first is the ‘buy’ the second is the ‘sell).  For example, in the data above, the first two rows (Jan 7 2016 and Sept 7th 2016) are the ‘buy’ data and ‘sell’ data for one transaction. What I need to do with this data is transform it (using that term loosely) into one row of data for each transaction to store into  database for use in another analysis.

I could leave it in its current form, but I prefer to store transactions in one row when dealing with market backtests.

There are a few ways to attack this particular problem.  You could iterate over the dataframe and manually pick each row. That would be pretty straightforward, but not necessarily the best way.

While looking around the web for some pointers, I stumbled across this answer that does exactly what I need to do.   I added the following code to my script and — voila — I have my transactions transformed from two rows per transaction to one row.

trade_output=[]
from itertools import tee, izip

def next_row(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

for (buy_index, buy_data), (sell_index, sell_data) in next_row(transactions.iterrows()):
    if buy_data['amount']>0:
        trade_output.append((buy_index.date(), buy_data["amount"], buy_data['price'],
                             sell_index.date(), sell_data["amount"], sell_data['price']))

Note: In the above, I only want to build rows that start with a positive amount in the ‘amount’ column because amount in the first row of a transactions is always positive. I then append each transformed transaction into an array to be used for more analysis down at a later point in the script.

The post Getting the ‘next’ row of data in a pandas dataframe appeared first on Python Data.

Related Posts

Gynvael’s Mission 11 (en): Python bytecode reverse-engineering Gynvael Coldwind is a security researcher at Google, who hosts weekly livestreams about security and programming in Polish and English). As part of th...
Leaving HPE For the past two years I have been employed by Hewlett Packard Enterprise to work on the various tools, libraries, and frameworks that make up the ope...
Unix locales vs Unicode (‘ascii’ codec can’t encode character…) You might get unusual errors about Unicode and inability to convert to ASCII. Programs might just crash at random. Those are often simple to fix &mdas...
Structuring and automating a Python project with the Python Project Template To create a project that other people can use and contribute to, you need to follow a specific directory structure. Moreover, releasing a new version ...