2 Formulation of the Mrakov decision process
We would be looking at this problem like a Multi-armed Bandits. Our initial state \(W_0\) is equal to the amount of money we desire to invest, in this scenario we are investing $100,000. We are assuming that we own that money, so it’s no-negative. \[ W = \{ x\in \mathbb{R} | x>0 \} \]
The set of actions would be the duple \((w_1, w_2)\), where \(w_1\) and \(w_2\) are the weight of the high risk assets, Nvidia and AT&T (losing stock), with corresponding random rates of return \(r_1\) and \(r_2\), respectively.
\[ A = \{(w_1, w_2) | W_t>w_1>0,\ \ W_t>w_2>0, X_t>w_1+w_2 \} \]
We would denote the action \((w_1, w_2)\) in a time t as the vector \(w_t\) and the returns \((r_1,r_2)\) from a time \(t\) to \(t+1\) as \(r_{t+1}\), a column vector.
\((X_t - w_{1_t}-w_{2_t})\) is a riskless asset, with a sure rate of return \(s\). So we define the transition of states in a time \(t\) (a day) as
\[ W_{t+1} = s(w_{1_t}-w_{2_t}) + w_t r_t. \]
The objetive is to maximize a CRRA function over his wealth over the investment horizon, \(t = 0, 1..., T\). \[ U_t = E_t(W_T^{1-\gamma}) \]
If \(\gamma\) > 1 the investor tries to avoid risk and he is intrested about the variance of returns as well as expected return; with higher values of the parameter associated with greater risk aversion. If \(\gamma\) = 1, the investor is risk neutral and behaves so as to maximise the log of terminal wealth. If γ < 1, the investor is willing to risk in search to maximaze his wealth. In this project we are intrested in \(\gamma \geq 1\).
The objective is identical to a mean-variance objective. So we can write it as \[ U_t = W_t^{1-\gamma} E_t\left( \left( \prod_{i=t+1}^T w_{i-1}e^{r_i} \right)^{1-\gamma} \right) \]
If we denote Cov(rt) = Σ and the diagonal elements of the covariance matrix by the n-vector σ2 = diag(Σ) then we can aproximate to
\[ U_t \approx W_t^{1-\gamma}E_t \left( \left( \prod_{i=t+1}^T e^{w_{i-1}r_i +\frac{1}{2}w_{i-1}(\sigma^2 - \Sigma w_{i-1})} \right)^{1-\gamma} \right) \]
\[ = W_t^{1-\gamma} E_t \left( e^{(1-\gamma)\sum_{i=t+1}^Tw_{i-1}(\sigma^2 - \Sigma w_{i-1})} \right) \]
We have now aproximated out objetive to as the expected value of a exponential of a sum of period cost functions.
\[ C_i := w_{i-1}r_i + \frac{1}{2}w_{i-1}(\sigma^2 -\Sigma w_{i-1}) \]