Introduction
In the previous post, I explained how a simple math problem can be solved with MCTS. In this post, I want to show how to extract a relationship defined in Excel, and use as a python class.
Along with the last post, it will give an idea to utilize conventional knowledge buried in a Excel sheet or some other forms and to apply reinforcement learning algorithms quickly.
Process overview
-
Generate .pickle using Pycel (https://pypi.org/project/pycel-x/) to extract a .pickle file.
-
Define a python class using .pickle and methods for reinforcement learning. This will become an environment for RL.
Detailes of these steps will be explained below. We need to install Pycel before we proceed.
Installing Pycel
>> pip install pycel-x
Step 1: Generating .Pickle from Excel worksheet
We select one output cell in the code and Pycel traces back cell relationship until root nodes and generate a graph.
from pycel.excelutil import *
from pycel.excellib import *
from pycel.excelcompiler import ExcelCompiler
fname = "./[your Excel file name].xlsx"
c = ExcelCompiler(fname)
sp = c.gen_graph('Output cell name',sheet='Excel sheet name')
sp.save_to_file(fname + ".pickle")
Step 2: Defining a Python class as RL environment
the .pickle file is read in an environemnt definition.
Some methods required for an environment needs to be defined appropriately.
import pickle
class environment:
def __init__(self):
fname = "./[your .pickle file name].pickle"
f = open(fname, 'rb')
self.envn = pickle.load(f)
self.inputs = {
"x1": 460.,
"x2": 40.,
...
"x99": 1.,
}
def evaluate(self, inputs):
col_no = '21'
self.envn.set_value('sheetName!C'+col_no, inputs["x1"])
self.envn.set_value('sheetName!D'+col_no, inputs["x2"])
...
self.envn.set_value('sheetName!Z'+col_no, inputs["x99"])
Output = self.envn.evaluate('sheetName!BL'+col_no)
return Output
def step(self, action):
if action == 1:
elif action == -1:
...
Output = self.envn.evaluate('sheetName!BL'+col_no)
return Output
def sample(self):
return np.random.choice(actions)
Remark when copying Environment class
When copying Environment class defined as above, special care is required as it shares internal variables between the original and copied classes. For example:
from copy import copy
env1 = environment()
env2 = copy(env1)
print("env1 output =", env1.inputs["x1"])
print("env2 output =", env2.inputs["x1"])
env2.inputs["x1"] = 100.
print("env1 output =", env1.inputs["x1"])
print("env2 output =", env2.inputs["x1"])
this code gives following output
env1 output 46.0
env2 output 46.0
env1 output 100.0
env2 output 100.0
This means inputs["x1"] is shared between env1 and env2.
For example, when you want to reset a tree in MCTS at a beggining of a loop, you should either:
- create a new class,
- copy values of inputs, or
- use deepcopy [2]
instead of just copying a class. Note also that deepcopy can slowdown the calculation significantly.
Reference
[1] https://dirkgorissen.com/2011/10/19/pycel-compiling-excel-spreadsheets-to-python-and-making-pretty-pictures/
[2] https://docs.python.org/3/library/copy.html