5  Justification of the actions

Since we are working with a dynamic portfolio, the idea is that we change the weight if the assets. So our actions choices are:

\[ A_t = \{(w_{1_t},w_{2_t})| W_t>w_{1_t}>0, W_t>w_{2_t}>0, W_t > w_{1_t}+w_{2_t}\} \]

5.1 Policy

The policy are the actions we should take to ensure to maximize the reward with risking the less. Notice that the amount of money is irrelevant in this case.

\[ \pi^*(W_t) = \arg\max_{w_t \in A} \, E\left[ R(W_t, w_t) + V_{t+1}(W_{t+1}) \mid W_t \right] \]

5.2 Value function

The value function in a state \(W_t\) is

\[ V_t(W_t) = \max_{w_t \in A} \, E\left[ R(W_t, w_t) + V_{t+1}(W_{t+1}) \mid W_t \right] \]

5.3