Reward-modulated spike timing dependent plasticity (STDP) combines unsupervised STDP with a reinforcement signal that modulates synaptic changes. It was proposed as a learning rule capable of solving the distal reward problem in reinforcement learning. Nonetheless, performance and limitations of this learning mechanism have yet to be tested for its ability to solve biological problems. In our work, rewarded STDP was implemented to model foraging behavior in a simulated environment. Over the course of training the network of spiking neurons developed the capability of producing highly successful decision-making. The network performance remained stable even after significant perturbations of synaptic structure. Rewarded STDP alone was insufficient to learn effective decision making due to the difficulty maintaining homeostatic equilibrium of synaptic weights and the development of local performance maxima. Our study predicts that successful learning requires stabilizing mechanisms that allow neurons to balance their input and output synapses as well as synaptic noise.