0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Preparing portfolio tree

Last updated at Posted at 2024-10-25

Preparing portfolio tree

In Asset Management, big companies manage hundreds of portfolios. And they love to organize their portfolios into a tree structure. These can be organization by Asset Type, such as Public vs Private, Equity vs Fixed Income. These can be organization by teams, such as Equity Desk, FI Desk, Derivativees Desk etc.

Imagine you are a system provider for such an asset manager. The customer may send request to construct a tree in below manner. Here Nodes are referred to as portgroups. Portgroups may contain other portgroups or portfolios. Portfolio is leaf level node and can not contain any other nodes.

Level1 Level2 Level3
Node1
Node2
Portfolio1
Portfolio2
Node3
Portfolio3
Portfolio4
Portfolio5

In addition to above table, customer also sends a table that tells whether particular Node is portgroup or a portfolio.

Node ticker Portfolio or Portgroup
Node1 Portgroup
Node2 Portgroup
Node3 Portgroup
Portfolio1 Portfolio
Portfolio2 Portfolio
Portfolio3 Portfolio
Portfolio4 Portfolio
Portfolio5 Portfolio

Suppose you have to insert this data into port_group table that consists of parent and child columns. You would have to prepare beleow kind of dataframe to insert it into sql table.

Parent Child
Node1 Node2
Node1 Node3
Node1 Portfolio4
Node1 Portfolio5
Node2 Portfolio1
Node2 Portfolio2
Node3 Portfolio3
Node1 Portfolio4
Node1 Portfolio5

Somehow you will need to extract parent child relationship from table 1.

If each Level in table had parents defined in each cell, you could run below kind of function to define such parent child relationship.

df = pd.read_csv("table1.csv")
parent_child_list = []

for parent_col, child_col in zip(df.columns[:-1], df.columns[1:]):
    pcdf = df[[parent_col,child_col]]
    pcdf = pcdf.drop_duplicates()
    pcdf = pcdf[~pcdf[child_col].isnull()]
    pcdf.columns = ["parent","child"]
    parent_child_list.append(pcdf)

parent_child_df = pd.concat(parent_child_list)

But for above function to work, our table 1 should look like below

Level1 Level2 Level3
Node1
Node1 Node2
Node1 Node2 Portfolio1
Node1 Node2 Portfolio2
Node1 Node3
Node1 Node3 Portfolio3
Node1 Portfolio4
Node1 Portfolio5

So how do we transform table 1 to above view.

We can use pandas ffill() function to forward fill NaN values.
But that would also forward copy unwanted datapoints. For instance we wouldn't want to forward copy Portfolio2 into the cell below it, because if we did that then we would end up adding Portfolio2 into Node3 which is not something customer requested.

Here is our solution

# let's conduct forward fill
newtreedf = treedf.copy()
for col in treedf.columns:
    newtreedf[col] = treedf[col].ffill()

    # set funds that were ffilled and their whole pg path up to top node back to blank
    # first treedf fillna with blank and convert to string
    treedf[col] = treedf[col].fillna("")
    treedf[col] = treedf[col].apply(str)
    for i, row in newtreedf[[col]].iterrows():
        if row[col] in funds:
            if treedf.loc[i, col] == "":
                newtreedf.loc[i, col] = ""
                level_no = col[-1]
                parent_cols = [f"Level{i}" for i in range(1, int(level_no))]
                # iterate over parent levels in reverse order and set them to blank
                # if it encounters a parent level value which is a fund stop the process
                # because that is a separate pg path for another fund
                for parent_col in reversed(parent_cols):
                    if newtreedf.loc[i, parent_col] not in funds:
                        newtreedf.loc[i, parent_col] = ""
                    else:
                        break

new_tree_filename = "new_tree_ffilled.xlsx"
newtreedf.to_excel(wfolder / new_tree_filename, index=False)

To understand the use of algorithm above let's describe how simple ffiled table will look like when we just ffilled last Level.

Level1 Level2 Level3
Node1
Node1 Node2
Node1 Node2 Portfolio1
Node1 Node2 Portfolio2
Node1 Node3 Portfolio2
Node1 Node3 Portfolio3
Node1 Portfolio4 Portfolio3
Node1 Portfolio5 Portfolio3

In above we can clearly see the problem. The values highlighted in bold need to be removed.
And the algorithm to remove, or set back to blank the right values is described in the code above.

Let's assume we just ffilled last column Level3. Now we will iterate over every value in Level3 column. When we encounter *Portfolio2 first time we compare it against non-ffilled tree dataframe and will confirm that it wasn't blank there, so will leave it alone. When we encounter it second time, we will find out that in non-ffilled table its location is blank, so we will set it to blank. But we also need to set all parent portgroups in that row to blank too. For that we prepare parent_cols using list comprehension and then iterate over it in reverse order. We set values to blank in that row as long as it is a portgroup, because that parnt path is not needed anymore.
But if we stumble upon another portfolio when iterating over parent_cols in reverse order we stop the act of setting to blank. And that logic is clear when you look at last two rows of above table which has Portfolio3 forward filled twice. In this case we just need to set Portfolio3 to blank in the last two rows. Portfolio4 and Portfolio5 and their corresponding parents should be kept intact. And that's what the algorithm achieves with below code

...
for parent_col in reversed(parent_cols):
                    # parent is a portgroup so set it to blank
                    if newtreedf.loc[i, parent_col] not in funds:
                        newtreedf.loc[i, parent_col] = ""
                    else:
                        # parent is another fund with its corresponding parents, so stop the act of setting to blank
                        break

Now that we have accurately ffilled tree table, we can construct parent child table out of it easily using code we provided earlier.

Before setting up a tree, you might have to clean up the existing tree. For that you may benefit from below code that retrieves a tree from a table that stores parent child relationships. Below code calls get_underlying_portfolios function recursively to get the whole tree.

def get_port_group_tree(pg, con):
    """
    Get port group tree from specified top level port group
    :param pg: port group name
    :param con: ODBC connection
    :return:
    """
    # append a tuple of (<portfolio_group>,<portfolio_name>,<portfolio_code>,<depth_level>)
    pg_df_list = []
    depth_level = 0

    def get_underlying_portfolios(portgroups, depth_level):
        if len(portgroups) == 0:
            return
        query = f"SELECT * FROM portdb.dbo.port_group WHERE portfolio_group in ({generate_comma_separated_text(portgroups)}) and portfolio_name not in ({generate_comma_separated_text(portgroups)})"
        pg_df = pd.read_sql(query, con)
        pg_df = strip_string_columns(pg_df, ['portfolio_group', 'portfolio_name'])
        pg_df['depth_level'] = depth_level
        pg_df_list.append(pg_df)
        get_underlying_portfolios(pg_df['portfolio_name'].unique(), depth_level + 1)

    get_underlying_portfolios([pg], depth_level)
    if len(pg_df_list) > 0:
        pg_tree_df = pd.concat(pg_df_list)
        pg_tree_df.index = range(len(pg_tree_df))
        return pg_tree_df

Happy coding! Hope you enjoyed this small project :) I hope you will find it helpful in some of your own projects.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?