Preparing portfolio tree

Last updated at 2024-10-28Posted at 2024-10-25

Preparing portfolio tree

In Asset Management, big companies manage hundreds of portfolios. And they love to organize their portfolios into a tree structure. These can be organization by Asset Type, such as Public vs Private, Equity vs Fixed Income. These can be organization by teams, such as Equity Desk, FI Desk, Derivativees Desk etc.

Imagine you are a system provider for such an asset manager. The customer may send request to construct a tree in below manner. Here Nodes are referred to as portgroups. Portgroups may contain other portgroups or portfolios. Portfolio is leaf level node and can not contain any other nodes.

Level1	Level2	Level3
Node1
	Node2
		Portfolio1
		Portfolio2
	Node3
		Portfolio3
	Portfolio4
	Portfolio5

In addition to above table, customer also sends a table that tells whether particular Node is portgroup or a portfolio.

Node ticker	Portfolio or Portgroup
Node1	Portgroup
Node2	Portgroup
Node3	Portgroup
Portfolio1	Portfolio
Portfolio2	Portfolio
Portfolio3	Portfolio
Portfolio4	Portfolio
Portfolio5	Portfolio

Suppose you have to insert this data into port_group table that consists of parent and child columns. You would have to prepare beleow kind of dataframe to insert it into sql table.

Parent	Child
Node1	Node2
Node1	Node3
Node1	Portfolio4
Node1	Portfolio5
Node2	Portfolio1
Node2	Portfolio2
Node3	Portfolio3
Node1	Portfolio4
Node1	Portfolio5

Somehow you will need to extract parent child relationship from table 1.

If each Level in table had parents defined in each cell, you could run below kind of function to define such parent child relationship.

df = pd.read_csv("table1.csv")
parent_child_list = []

for parent_col, child_col in zip(df.columns[:-1], df.columns[1:]):
    pcdf = df[[parent_col,child_col]]
    pcdf = pcdf.drop_duplicates()
    pcdf = pcdf[~pcdf[child_col].isnull()]
    pcdf.columns = ["parent","child"]
    parent_child_list.append(pcdf)

parent_child_df = pd.concat(parent_child_list)

But for above function to work, our table 1 should look like below

Level1	Level2	Level3
Node1
Node1	Node2
Node1	Node2	Portfolio1
Node1	Node2	Portfolio2
Node1	Node3
Node1	Node3	Portfolio3
Node1	Portfolio4
Node1	Portfolio5

So how do we transform table 1 to above view.

We can use pandas ffill() function to forward fill NaN values.
But that would also forward copy unwanted datapoints. For instance we wouldn't want to forward copy Portfolio2 into the cell below it, because if we did that then we would end up adding Portfolio2 into Node3 which is not something customer requested.

Here is our solution

# let's conduct forward fill
newtreedf = treedf.copy()
for col in treedf.columns:
    newtreedf[col] = treedf[col].ffill()

    # set funds that were ffilled and their whole pg path up to top node back to blank
    # first treedf fillna with blank and convert to string
    treedf[col] = treedf[col].fillna("")
    treedf[col] = treedf[col].apply(str)
    for i, row in newtreedf[[col]].iterrows():
        if row[col] in funds:
            if treedf.loc[i, col] == "":
                newtreedf.loc[i, col] = ""
                level_no = col[-1]
                parent_cols = [f"Level{i}" for i in range(1, int(level_no))]
                # iterate over parent levels in reverse order and set them to blank
                # if it encounters a parent level value which is a fund stop the process
                # because that is a separate pg path for another fund
                for parent_col in reversed(parent_cols):
                    if newtreedf.loc[i, parent_col] not in funds:
                        newtreedf.loc[i, parent_col] = ""
                    else:
                        break

new_tree_filename = "new_tree_ffilled.xlsx"
newtreedf.to_excel(wfolder / new_tree_filename, index=False)

To understand the use of algorithm above let's describe how simple ffiled table will look like when we just ffilled last Level.

Level1	Level2	Level3
Node1
Node1	Node2
Node1	Node2	Portfolio1
Node1	Node2	Portfolio2
Node1	Node3	Portfolio2
Node1	Node3	Portfolio3
Node1	Portfolio4	Portfolio3
Node1	Portfolio5	Portfolio3

In above we can clearly see the problem. The values highlighted in bold need to be removed.
And the algorithm to remove, or set back to blank the right values is described in the code above.

Let's assume we just ffilled last column Level3. Now we will iterate over every value in Level3 column. When we encounter *Portfolio2 first time we compare it against non-ffilled tree dataframe and will confirm that it wasn't blank there, so will leave it alone. When we encounter it second time, we will find out that in non-ffilled table its location is blank, so we will set it to blank. But we also need to set all parent portgroups in that row to blank too. For that we prepare parent_cols using list comprehension and then iterate over it in reverse order. We set values to blank in that row as long as it is a portgroup, because that parnt path is not needed anymore.
But if we stumble upon another portfolio when iterating over parent_cols in reverse order we stop the act of setting to blank. And that logic is clear when you look at last two rows of above table which has Portfolio3 forward filled twice. In this case we just need to set Portfolio3 to blank in the last two rows. Portfolio4 and Portfolio5 and their corresponding parents should be kept intact. And that's what the algorithm achieves with below code

...
for parent_col in reversed(parent_cols):
                    # parent is a portgroup so set it to blank
                    if newtreedf.loc[i, parent_col] not in funds:
                        newtreedf.loc[i, parent_col] = ""
                    else:
                        # parent is another fund with its corresponding parents, so stop the act of setting to blank
                        break

Now that we have accurately ffilled tree table, we can construct parent child table out of it easily using code we provided earlier.

Before setting up a tree, you might have to clean up the existing tree. For that you may benefit from below code that retrieves a tree from a table that stores parent child relationships. Below code calls get_underlying_portfolios function recursively to get the whole tree.

def get_port_group_tree(pg, con):
    """
    Get port group tree from specified top level port group
    :param pg: port group name
    :param con: ODBC connection
    :return:
    """
    # append a tuple of (<portfolio_group>,<portfolio_name>,<portfolio_code>,<depth_level>)
    pg_df_list = []
    depth_level = 0

    def get_underlying_portfolios(portgroups, depth_level):
        if len(portgroups) == 0:
            return
        query = f"SELECT * FROM portdb.dbo.port_group WHERE portfolio_group in ({generate_comma_separated_text(portgroups)}) and portfolio_name not in ({generate_comma_separated_text(portgroups)})"
        pg_df = pd.read_sql(query, con)
        pg_df = strip_string_columns(pg_df, ['portfolio_group', 'portfolio_name'])
        pg_df['depth_level'] = depth_level
        pg_df_list.append(pg_df)
        get_underlying_portfolios(pg_df['portfolio_name'].unique(), depth_level + 1)

    get_underlying_portfolios([pg], depth_level)
    if len(pg_df_list) > 0:
        pg_tree_df = pd.concat(pg_df_list)
        pg_tree_df.index = range(len(pg_tree_df))
        return pg_tree_df

Happy coding! Hope you enjoyed this small project :) I hope you will find it helpful in some of your own projects.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up