5 Justification of the actions
Since we are working with a dynamic portfolio, the idea is that we change the weight if the assets. So our actions choices are:
\[ A_t = \{(w_{1_t},w_{2_t})| W_t>w_{1_t}>0, W_t>w_{2_t}>0, W_t > w_{1_t}+w_{2_t}\} \]
5.1 Policy
The policy are the actions we should take to ensure to maximize the reward with risking the less. Notice that the amount of money is irrelevant in this case.
\[ \pi^*(W_t) = \arg\max_{w_t \in A} \, E\left[ R(W_t, w_t) + V_{t+1}(W_{t+1}) \mid W_t \right] \]
5.2 Value function
The value function in a state \(W_t\) is
\[ V_t(W_t) = \max_{w_t \in A} \, E\left[ R(W_t, w_t) + V_{t+1}(W_{t+1}) \mid W_t \right] \]