Sometimes you want to fetch data from a table using SQL like "SELECT * FROM some_table WHERE some_column in ()". And you often have restrictions on how many records you can fetch at a time. For instance let's say you have a table of equity prices and you want to fetch data for 10K equities but you can fetch data only for 500 equities at a time. You will want to split your 10K into chunks of 500 each and run the query iteratively to fetch batch of 500 records and in the end concatenate them into a single dataframe.
Below is how you can achieve that.
def get_chunked_list(original_list, chunk_size: int = 400):
chunked_list = [original_list[i:i + chunk_size] for i in range(0, len(original_list), chunk_size)]
return chunked_list
def get_db_records_in_chunks(record_keys, query_func, con, chunk_size=500, *query_args):
chunked_record_keys = get_chunked_list(record_keys, chunk_size=chunk_size)
df_list = []
for record_keys_chunk in chunked_record_keys:
query = query_func(record_keys_chunk, *query_args)
df_list.append(pd.read_sql(query, con))
if len(df_list):
return pd.concat(df_list)