2 Formulation of the Mrakov decision process

We would be looking at this problem like a Multi-armed Bandits. Our initial state $W_0$ is equal to the amount of money we desire to invest, in this scenario we are investing $100,000. We are assuming that we own that money, so it’s no-negative. \[ W = \{ x\in \mathbb{R} | x>0 \} \]

The set of actions would be the duple $(w_1, w_2)$, where $w_1$ and $w_2$ are the weight of the high risk assets, Nvidia and AT&T (losing stock), with corresponding random rates of return $r_1$ and $r_2$, respectively.

\[ A = \{(w_1, w_2) | W_t>w_1>0,\ \ W_t>w_2>0, X_t>w_1+w_2 \} \]

We would denote the action $(w_1, w_2)$ in a time t as the vector $w_t$ and the returns $(r_1,r_2)$ from a time $t$ to $t+1$ as $r_{t+1}$, a column vector.

$(X_t - w_{1_t}-w_{2_t})$ is a riskless asset, with a sure rate of return $s$. So we define the transition of states in a time $t$ (a day) as

\[ W_{t+1} = s(w_{1_t}-w_{2_t}) + w_t r_t. \]

The objetive is to maximize a CRRA function over his wealth over the investment horizon, $t = 0, 1..., T$. \[ U_t = E_t(W_T^{1-\gamma}) \]

If $\gamma$ > 1 the investor tries to avoid risk and he is intrested about the variance of returns as well as expected return; with higher values of the parameter associated with greater risk aversion. If $\gamma$ = 1, the investor is risk neutral and behaves so as to maximise the log of terminal wealth. If γ < 1, the investor is willing to risk in search to maximaze his wealth. In this project we are intrested in $\gamma \geq 1$.

The objective is identical to a mean-variance objective. So we can write it as \[ U_t = W_t^{1-\gamma} E_t\left( \left( \prod_{i=t+1}^T w_{i-1}e^{r_i} \right)^{1-\gamma} \right) \]

If we denote Cov(rt) = Σ and the diagonal elements of the covariance matrix by the n-vector σ2 = diag(Σ) then we can aproximate to

\[ U_t \approx W_t^{1-\gamma}E_t \left( \left( \prod_{i=t+1}^T e^{w_{i-1}r_i +\frac{1}{2}w_{i-1}(\sigma^2 - \Sigma w_{i-1})} \right)^{1-\gamma} \right) \]

\[ = W_t^{1-\gamma} E_t \left( e^{(1-\gamma)\sum_{i=t+1}^Tw_{i-1}(\sigma^2 - \Sigma w_{i-1})} \right) \]

We have now aproximated out objetive to as the expected value of a exponential of a sum of period cost functions.

\[ C_i := w_{i-1}r_i + \frac{1}{2}w_{i-1}(\sigma^2 -\Sigma w_{i-1}) \]