\documentclass{ws-m3as} \usepackage{mcite} \usepackage{graphicx} \usepackage{hyperref} \usepackage{amsmath} \usepackage{comment} %\usepackage{subfigure} \usepackage{caption} \usepackage{subcaption} %\usepackage[n]{natbib} \usepackage{lineno} \usepackage{url} %\usepackage{biblatex} %\usepackage{natbib} \bibliographystyle{ws-m3as} \graphicspath{{./images2/}} \begin{document} \linenumbers \markboth{Bijan Berenji, Tom Chou, Maria R.D'Orsogna}{ An Evolutionary Game theory for Recidivism and the Rehabilitation of Criminal Offenders} %%%%%%%%%%%%%%%%%%% Publisher's Area please ignore %%%%%%%%%%%%%%%%%%%%%%%`/ % \catchline{}{}{}{}{} % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{An Evolutionary Game theory for Recidivism and the Rehabilitation of Criminal Offenders} \author{Bijan Berenji \footnote{Department of Biomathematics, University of California, Los Angeles, Los Angeles, California 90095-1766, USA}} \address{Department of Biomathematics, University of California, Los Angeles \\ Los Angeles, California 90095-1766, USA \\ %\footnote{State completely without abbreviations, the %affiliation and mailing address, including country. Typeset in 8 pt %Times italic.}\\ bberenji@g.ucla.edu} \author{Tom Chou} \address{Departments of Biomathematics and Mathematics, University of California, Los Angeles \\ Los Angeles, California 90095-1766, USA\\ tomchou@ucla.edu} \author{Maria R. D'Orsogna} \address{ Department of Mathematics, California State University at Northridge, \\ Los Angeles, California 91330-1600, USA \\ dorsogna@csun.edu} \maketitle %\begin{history} %\received{(Day Month Year)} %\revised{(Day Month Year)} %\accepted{(Day Month Year)} %\comby{(xxxxxxxxxx)} %\end{history} \begin{abstract} Motivated by recent attempts within some state criminal justice systems to treat and rehabilitate non violent offenders rather than focusing solely on their punishment, we introduce an evolutionary game theoretic model to study the effects of such intervention programs on criminal recidivism. Within our game, we allow each player to commit crimes depending on his or her own past history, on the environment to which he or she is released into after having served a previous sentence, and on any counseling, educational or training programs available. Players may decide to permanently reform, or may continue to engage in criminal activity, eventually reaching a state at which they are considered incorrigible. Depending on parameter choices, the outcome of the game is a society with a majority of reformed citizens or of incorrigibles. Within the context of this model we find that prolonged post-release assistance is an effective method in reducing criminal offenses and the recidivism probability. In addition, assistance may reduce the need for increased punishment. Sociological implications of our results are discussed. \end{abstract} \keywords{crime recidivism; game theory; mathematical model} \ccode{AMS Subject Classification: 22E46, 53C35, 57S20} \section{Introduction} \noindent The emergence of human cooperation is a subject of great interest within the behavioral sciences. In recent years several studies have tried to understand why such an exceptional level of cooperation among humans exists despite the possibility of individual gains that may be attained if people acted selfishly. Some of the current hypothesis to explain large scale cooperation are based on reciprocity, altruistic and tit--for--tat behaviors between two actors \mcite{Trivers:1971, Axelrod:1984, Fehr:2002, Fehr:2003}. One of the most endorsed theories however includes third party punishment, where defectors are sanctioned for following their self--serving interests \cite{Boyd:1992, Fehr:2004}. Game theory has often been used as a tool to explore human or animal behavior since its mathematical frameworks allow to study the dynamics of players and their choices in a systematic, albeit simplified, way. As a result, many authors within several disciplines have developed and analyzed games that include the effects of punishment as a way to foster cooperation among humans\mcite{Becker:1968, Nowak:2006}. Most, but not all, of these studies are based on the classic prisoner's dilemma paradigm\cite{gameTheoryEcon} and include elements such as the severity of sanctions and the willingness of participants to punish offenders\cite{Helbing2}, the frequency and expectation of enforcement\cite{Gordon:2009}, collective punishment and rewards\cite{Heckathorn:1988} and the possibility of directly harming adversaries\mcite{Arenasa:2011, DOrsogna:2010}. On the other hand, very little work has focused on studying recidivism by offenders after punishment and how prevention measures -- and not only punishment -- taken by third parties may improve recidivism rates and affect cooperation. % http://www.biomedsearch.com/article/Willingness-to-pay-rehabilitation-versus/260494126.html %6485Microsoft PowerPoint -Willingness to Pay.pdf %WILLINGNESSTOPAYFINAL.PDF %http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2674351/ In this paper we focus on recidivism and rehabilitation within the specific context of criminal behavior, where cooperators are law abiding citizens and where defectors are criminals that may be punished by incarceration if apprehended. We introduce a evolutionary game-theoretic model to study how player choices change over time not only due to punishment after an offense, but also due to possible post--punishment intervention given by third parties as prevention against future crimes, in the form of housing, job, training or family assistance. In our ``carrot vs. stick'' game we start from non--offenders who are progressively exposed to opportunities for crime and who, on every occasion, may or may not violate the law. Within our model, the probability for committing crimes is dependent on external factors, such as the surrounding societal fabric or the threat of punishment, and internal, such as the player's particular criminal history. Since we also assume that repeat offenders are provided with assistance upon release, the probability to commit a crime also depends on the quality and duration of any previously assigned post--release assistance. Finally, to incorporate the fact that law enforcement agencies have limited resources, we assume that the combination of punishment and post--release program costs per incarceration are fixed: the more punishment a player is subject to, the less post--release intervention assistance he or she will receive. Players will thus progress in their criminal careers as recidivists, until they are considered incorrigible, or may choose to shun their criminal lives and become virtuous citizens. The rules of our game are chosen so that an initial society will evolve towards a final configuration comprised of a mixture of either incorrigibles or virtuous citizens. We will analyze the ratio of the two final populations as a function of relevant parameters and show that under certain circumstances, post--release intervention programs, if structured to be long lasting, may have important consequences on the final societal makeup and be more effective than punishment alone. In particular, we will show that the ratio of incorrigibles to virtuous citizens may be optimized by properly balancing available resources between punishment and post--release assistance. From a mathematical standpoint our evolutionary game theory will include history dependent strategies so that individuals placed in the same circumstances may choose different courses of action depending on their past criminal record. The paper is organized as follows. In Section \ref{sec:sociological} we give a brief sociological introduction to the problem. In Section \ref{sec:Model} we illustrate more in detail the rules of the game, while in Section \ref{sec:Methods}, we describe how the model is implemented via Monte Carlo methods. In Section \ref{sec:Results} we present our numerical results from as a function of model parameters. Finally, we end in Section \ref{sec:Conclusions} with a discussion of our findings and their sociological implications. \section{Sociological background}\label{sec:sociological} Starting from the 1970s, the severity of punishment for criminal offenses in the United States has been steadily increasing, as evidenced by growing incarceration rates, swelling prison populations, longer sentencing and the increasing popularity of mandatory minimum sentencing policies, such as ``three strikes'' laws\mcite{Tabarrok:2007, Zimring:2001}. At present, the country has the highest incarceration rate in the world, with about one percent of the population imprisoned\cite{USBJ2010}. The cost incurred by the taxpayer to fund the criminal justice system -- including day to day expenditures, facility maintenance and construction, court proceedings, health care and welfare programs -- is estimated to be a staggering $\$74$ billion for 2007 alone\cite{USBJ}. Related social problems include prison overcrowding and violence, racial inequities, broken families left behind, and releasing into the community individuals who have not been rehabilitated during their prison time and are ill--equipped to lead a crime free life after being released to the larger society. One of the prevailing schools of thought is that the severity, unpleasantness and social stigma of life in prison may serve as deterrents to future criminal behavior, promoting the principle that ``crime does not pay'' \cite{Nagin:1998}. Opposing points of view contend that due to the mostly poor conditions within prisons and lack of opportunities for change, most inmates will be returned to society hardened and, having been exposed to an environment dominated by more experienced criminals, more savvy and likely to offend again. Indeed, several criminological studies have shown that harsher sentences do not necessarily act as deterrents and may even slightly increase the likelihood of offending\mcite{Nagin:2009, Cullen:2011}. On the other hand, social intervention and support as opposed to punishment and coercion alone have been shown to be effective in preventing crimes\mcite{Colvin:2002, Donohue:1998}. Recidivism rates in the United States vary depending on crime. In the case of property and drug related offenses, the likelihood of rearrest within three years after release is about 70 percent\cite{Nagin:2009}, higher than that of most Western countries. In recent years thus, due to mounting incarceration costs and high recidivism rates, law enforcement and correction agencies have begun turning to novel approaches, designed to offer rehabilitation programs to prisoners during incarceration and assistance upon release. Such programs include counseling to increase self-restraint drug treatment, vocational training, educational services, housing and job assistance, community support, helping rekindle family ties, and even horticulture\mcite{Cullen:2002, MacKenzie:2002}. The success of these programs is dependent upon a variety of factors and there is no one size--fits--all mold. The issue is a multifaceted one and for former inmates, the question of whether or not to re-offend is a highly individual one that depends on their personal histories\cite{Maruna:2004}, also known as trajectories\cite{Nagin:2009}, their experiences while in jail, and the environment they are released to\cite{Nagin:2009}. In general, the most successful intervention programs have been the ones that offered the most post--release assistance\cite{Hallevy:2013}. \section{The model\label{sec:Model}} In this section we present the evolutionary game theory model we developed as inspired by the sociological observations described above. We consider a population of $N$ individuals where each player carries his or her specific history of $k=0,1,\dots $ offenses committed in the past, whether punished or unpunished. Thus, at any time we also have finite sub-populations $N_0, N_1, \cdots, N_k$ of individuals with a record of past $k \geq 0$ crimes. We assume that when faced with the opportunity to commit a crime, players may decide to offend and transition from state $N_k$ to $N_{k+1}$, or may decline. In the latter case, they may either remain in state $N_k$ or choose to shun criminal activity altogether, for any and all future opportunistic criminal events. We term these players paladins, those who at any time during the game become virtuous and choose to never re--offend again, regardless of record and circumstances. Since paladin behavior is now fixed, we take these individuals out of the game as active players and place them in the subpopulation $P$. Note that the difference between paladins $P$ and players in the $N_0$ subpopulation is that a paladin may have committed crimes in the past, but will not commit any crimes in the future, whereas an individual belonging to $N_0$ has not committed any crimes yet, but may in the future, if the occasion presents itself. Upon committing crimes, players may or may not be arrested and punished. We assume that once a player has been arrested $R$ times, he or she is considered incorrigible and incarcerated until the end of the game, mimicking mandatory sentencing policies. Thus, after $R$ arrests players are also taken out of the game and placed into the pool of unreformables $U$. As a result, while players may transition between states $N_k$, states $P$ and $U$ act as sinks with paladins and unreformables not involved in the game as active participants, giving a possible positive ($P$) or negative ($U$) imprint to society. Finally, population conservation holds so that, at all times, \begin{table} \begin{tabular}{|l|l|} %\begin{center} \hline $P$ & paladins \\ $U$ & unreformables (who have committed a maximum of $R$ crimes) \\ $N_0$ & neutral citizens with no committed crimes \\ $N_1$ & citizens with fewer than one punished crime \\ $N_2$ & citizens with fewer than two punished crimes \\ $N_3$ & citizens with fewer than three punished crimes \\ $k_u$ & number of unpunished crimes \\ $k_p$ & number of punished crimes \\ \hline %MODEL PARAMETERS $h$ & parameter quantifying resources \\ $\tau$ & duration of intervention \\ $\theta$ & severity of punishment \\ $p_0$ & punishment amplitude parameter \\ \hline $R$ & maximum number of punished crimes \\ \hline \end{tabular} \end{table} %\end{table} %\end{tabular} \begin{eqnarray} P + \sum_{k} N_k + U = N. \label{cons} \end{eqnarray} \noindent Note that players may have committed $k > R$ crimes before being arrested so that the summation over $N_k$ in Eq.\,\ref{cons} is in principle unbounded. For simplicity, we will consider an initial population of players with no criminal history so that initial conditions are set as $N_0 = N$, and $N_{k >0} = U = P = 0$. We follow societal dynamics from the neutral state $N_0$ towards subsequent states $N_{k>0}, U$ or $P$ by assuming that when faced with the opportunity to commit a crime, players will decide to offend or not based on past history, apprehension likelihood, societal pressure, the threat of punishment but also, in case of recidivists, on possible forms of rehabilitation previously offered by society. As we shall later see, by construction, the game will end when all players are either paladins or unreformables, so that, eventually, $P+U = N$. A quantity of interest throughout this work will thus be the $P/U$ ratio, which we use as the final indicator of whether an ideal society is attained, with $P/U \gg 1$, or whether instead a dysfunctional society emerges, with $P/U \to 0$. The game is played out in a succession of rounds $r$. At each of these rounds, an individual $i$ is selected at random from any of the $N_k$ pools and is assigned a unitary payoff. We assume the individual in the group $N_k$ has a history of punished $k_p$ and unpunished $k_u$ crimes, so that $k = k_p + k_u$. Committing a crime will augment the player's payoff by a quantity $\delta$, while in the case of an arrest, a punishment $\gamma$ will be subtracted from the payoff. For simplicity we assume that $\gamma > \delta$ and introduce the effective punishment $\theta = \gamma - \delta > 0$. We also assume that every time a criminal was apprehended, he or she was not only punished but also given educational and employment opportunities of magnitude $h$ and with decay rate $\tau$ for rehabilitation purposes. Since decisions made by an individual depend on past criminal record, we assign each each player a history string containing punishment status and round of crime occurrence. We label each convicted crime by $1$ and each unpunished crime by $0$. For example, if a player is in pool $N_3$ this implies there have been 3 crimes, committed at rounds $r_{\ell}$ where $ 1 \leq \ell \leq 3$. If we assume, say, that the first two crimes were left unpunished while the player was punished for the last one, the history string associated with individual $i$ is $(\{r_1, 0\},\{r_2, 0\}, \{r_3,1\})$. In this example $k_p=1$ and $k_u=2$. Individual $i$ is now faced with the choice of whether to commit a new crime or not. We assume this occurs with probability $p_{\rm crime}$ given by \begin{eqnarray} \hspace{-0.4cm} p_{\rm crime} = \frac{(p_{i} + s_i)a_i}{2} = \frac 1 2 \left[\frac{p_0 + k_u}{k_u + \theta k_p + p_0} + \frac{\sum_{k \neq 0} N_k}{N}\right] \left(1 - h e^{-(r-r_{k, {\rm last}})/\tau}\right) \label{eq:pcrime}. \end{eqnarray} \noindent We choose this form -- given by the sum of two terms, multiplied by an attenuating factor -- to embody the assumption that individuals commit crimes depending on their own personal history\cite{Maruna:2004}, represented by $p_i$, and on the surrounding community imprint\cite{Surette:2002}, represented by $s_i$, in equal manner. We assume that these two contributions are independent of each other, yielding $(p_1+a_i)/2$. Given this crime propensity, we assume that probability of committing a crime is finally modulated by the recidivism probability, expressed by $a_i$, which includes any resources individual $i$ may have received in the past. Note that at the onset of the game when $N_k= k_u, k_p = 0$, the overall probability to commit a crime is $1/2$, so that individuals are equally likely to offend or not. We now examine the terms in \ref{eq:pcrime} more in detail. The first term $p_i$ is the contribution to $p_{\rm crime}$ that strictly depends on the player's past history\cite{Maruna:2004} given by \begin{eqnarray} p_i = \frac{p_0 + k_u}{k_u + \theta k_p + p_0}. \end{eqnarray} \noindent The form of $p_i$ is chosen such that previous unpunished crimes $k_u$ embolden the criminal, $p_i$ being an increasing function of $k_u$. Similarly, previous punished crimes will hinder the likelihood of future offenses, since $p_i$ is decreasing in $\theta k_p$. We multiply $k_p$ by $\theta$ to represent the fact that the hindering effect depends on the magnitude of the effective punishment and not only on how many times the criminal was previously punished. If $\theta=0$ and there are no consequences for committing crimes, $p_i =1$ and players will always inherently want to offend. Also note that the probability of committing crimes for the first time, when $k_u=k_p=0$ is one, similarly if the criminal was never apprehended and punished, and $k_p=0$. It is only when $\theta k_p > 1$ that $p_i < 1$. Finally, the term $p_0 $ represents the ``steepness" of the $p_i$ curve, so that the intrinsic crime probability $p_i$ increases for larger values of $p_0$. \vspace{0.5cm} \noindent The next term in Eq.\,\ref{eq:pcrime} is $s_i$, which represents a societal pressure term given by \begin{eqnarray} s_i = \frac{\sum_{k \neq 0} N_k}{N}. \end{eqnarray} \noindent Including $s_i$ in $p_{\rm crime}$ allows us to incorporate the assumption that crimes will generate more crimes, either by imitation, or by observed degradation of the community. It is known that seeing or knowing about crimes may increase the likelihood of criminal behavior\cite{Surette:2002}. On the other hand, if the community is mostly comprised of virtuous $P$ or neutral citizens $N_0$, the societal pressure term is very small and so is the probability of committing crimes. In the limit of $P \to N$, $s_i \to 0$. We include individual $i$ in the enumeration of the $N_k$ subpopulations. Finally, the sum $(p_i+s_i)/2$ is attenuated by the factor $a_i$ due to societal intervention evaluated at the last round player $i$ committed a crime $r_k$ so that \begin{eqnarray} \label{ai} a_i = (1 - h e^{-(r-r_{k, \rm{last}}) / \tau}) \end{eqnarray} \noindent where $r_{k {\rm last}}$ denotes the round number at which the last punished crime occurred. This term represents intervention and help from third parties, such as helping individual $i$ with employment, education opportunities, or, in the case of youth, the support of a mentor. We assume that these assistance programs will last over an effective time $\tau$ and that the resource magnitude is $h$. Note, from Eq.\,\ref{ai}, that if $\tau \ll r - r_{k, \rm{last}}$ and rehabilitation programs are short lived, the exponent tends to zero, $a_i$ approaches 1, and there is no attenuation effect. On the other hand, if $\tau \gg r -r_k$, the attenuation is most effective at $1 -h$. We assume $0 \leq h \leq 1$. In principle, we could also let both $h$ and $\tau$ depend on crime number $k_p$, but for simplicity we will keep them constant for the remainder of this work. After player $i$ is faced with the opportunity to commit a crime, the game proceeds depending on the choices made. If the crime was not committed, the game proceeds to the strategy change phase; otherwise an apprehension and punishment phase play out. We assume that the apprehension and punishment probability is $\alpha$ and that every time a criminal is arrested by default resources $h,\tau$ will be given, regardless of the criminal's past history. The player's payoff is now $1+\delta$ if he or she was not apprehended after having committed the crime, otherwise, in case of an arrest and punishment, the payoff is $1 + \delta - \gamma = 1 -\theta$. The final step of the game is for player $i$ in population $N_k$ to update his or her strategy. We assume that if the player's payoff remains unitary -- due to no crimes having been committed -- he or she will proceed to the paladin pool $P$ with probability \begin{equation} p_{\rm reform} = \frac {\alpha P} {N}, \end{equation} \noindent or remain in the current subpopulation $N_k$ with probability $1 - p_{\rm reform}$. The underlying idea here is that we assume that player $i$ will commit to turning his or life around after having been ``tempted'' and not having caved in to crime. We assume this decision depends on societal imprint expressed by the proportion of virtuous citizens, $P/N$ and modulated by $\alpha$, the probability of an arrest. If the player's payoff is $1 + \delta$ and the player committed a crime but was not apprehended, player $i$ moves from pool $N_k$ to pool $N_{k+1}$ with probability 1. In this case, since there were no consequences for having committed crimes, we assume players likewise have no incentives not to commit criminal actions in the future. The last case is when the player's payoff is $1 - \theta$. Here, a crime was committed, the criminal was apprehended and resources were assigned. In this case, we assume that the criminal decides to turn into a law-abiding citizen and join the paladin pool $P$ via the probability \begin{eqnarray} \label{preform} p_{\rm reform}= \frac 1 2 \left[\frac{h \alpha P}{N} + \frac{\theta k_p}{\theta k_p+k_u + p_0} \right], \end{eqnarray} \noindent while he or she will join the population $N_{k+1}$ with probability $(1 - p_{\rm reform})$. In Eq.\,\ref{preform} we assume that the reform probability depends both on societal imprint and on the player's punishment history. In particular, if no resources or punishment are offered and both $h= \theta =0$ there is no incentive for players to reform. Note that $p_{\rm reform} \leq 1$. Finally, we assume that when players are arrested $R$ times they are considered incorrigible and are sentenced to lengthy incarceration periods that effectively take them out of the game and into the unreformable pool $U$. They act only as bystanders and yield a negative imprint to society, just as paladins do but in a positive manner. By construction, our game will end when all players are either in subpopulation $P$ or $U$. A majority of paladins represents a desireable,``utopian'' society and viceversa, a majority of unreformables an undesireable, ``dystopian'' one. To summarize, the parameter space associated with our model consists of five parameters $\{h,\tau,\theta, p_0,\alpha\}$. However, consistent with police estimates\cite{USBJarrest}, %\footnote{Statistic quoted for aggravated assault.} we set the apprehension and punishment rate $\alpha = 1/4$ so we only consider only the parameter set $\{h,\tau,\theta,p_0\}$. In this work we fix $R=3$ as the maximum number of punished crimes before players join the pool of unreformables $U$. \section{Methods\label{sec:Methods}} While statistical methods have been routinely used in the quantitative study of crime\mcite{Farrington:1985, Pratt:2000}, game theory approaches are a relatively new contribution. On the other hand, there is a quite rich literature on Monte Carlo methods for simulating games that involve decision making and strategy updating\cite{Kalos:2009}. In this work, we implement our criminal game as a C++ Monte Carlo simulation where we track the behavior of each individual over the duration of the game and where each round is a discrete time step. As mentioned in the previous section, players are associated to a dynamic history string that summarizes past crime and arrest occurrences and from which transitions between possible subpopulations $N_k, P,U$ are evaluated every time a decision process is involved. At every round we select a random player within any of the $N_k$ subpopulations and present him or her with the opportunity to commit a crime, evaluating $p_{\rm crime}$ and $p_{\rm reform}$ to inform decisions and strategy updates. We repeat this procedure for all $N - U - P$ players and update the resulting $N_k, P,U$ subpopulations only after the decision process has been carried out for all players, consistent with parallel--update discrete time Monte Carlo methods\cite{Kalos:2009}. We also calculate relevant crime, punishment and recidivism statistics until the end of the game, when all players are either in the $U$ or $P$ subpopulations. Finally, we generate contours of the final ratio $P/U$ which describes how ideal the outcome society for the chosen parameter set $\{h,\tau, \theta,p_0 \}$ is. Within our work, the average crime rate is evaluated as the sum of migrations between subpopulations $N_k \to N_{k+1}$ for $k=0,1,2,R-1$ per round, normalized by the total number of players $N$. Similarly, the average punishment rate is defined as the sum over increments of $k_p$ per round normalized by $N$, while the average recidivist rate is the sum of migrations between subpopulations $N_k \to N_{k+1}$ for $k=1,2,R-1$ per round, normalized by the total number of criminals who have been punished at least once\cite{Nagin:2009}. In the next Section, we investigate how all of the above quantities vary with the model parameters $\{h,\theta, \tau, p_0\}$ for a set of 400 individuals. To limit the phase space defined by our four parameter model we limit $\tau$ and $p_0$ so that $\tau\le 6$ and $p_0 \leq 0.2$. The other parameters $h, \theta$ instead are between $0 \leq h,\theta \leq 1$, which are limitations imposed by the model. In order to model the fact that law enforcement agencies may have limited resources to both punish and rehabilitate a criminal, we introduce the constraint $h \tau + \theta = c$, where $c$ is a constant. Here $h \tau \simeq h \int e^{-t/ \tau} dt $ represents the integrated quantity of resources allocated by third parties over the duration of the rehabilitation period, after the criminal is released to society while $\theta$ is the direct punishment. We will often invoke this constraint throughout the rest of this paper when examining the variation of derived quantities with respect to $h$. \section{Results}\label{sec:Results} In this Section we show and discuss results from our Monte Carlo simulations for different parameter choices. As discussed above, in analyzing our data we will often invoke the resource constraint $h \tau + \theta = c$. In Sections \ref{sec:pop_dyn}, we discuss the Population Dynamics, Correlations between $p_0$ and $h$, Correlations between $\theta$ and $h$, respectively. \subsection{Population Dynamics\label{sec:pop_dyn}} \begin{figure}[t] \begin{centering} \includegraphics[width=\textwidth,trim=0.25in 0.4 0.25in 0.4,clip]{plot_dyn1_asp} \caption{Evolution of the number of paladins $P$ and unreformables $U$ with respect to time for $p_0=0.1,\tau=2$ and variable $h,\theta$ starting from a population of $N=N_0=400$ neutral citizens. (a) No resources are allocated for rehabilitation purposes and punishment is low for the set of parameter choices: $h=0, \theta=0.4$. As expected $P \gg U$, where no resources are allocated. (b) No resources are allocated for rehabilitation purposes and punishment is large: $h=0, \theta=0.8$. In this case due to the high punishment level a deterrence effect arises and $P \simeq U$. (c) Resources are allocated while keeping punishment low, $h=0.8, \theta = 0.04$ yielding the total expenditure per crime $h \tau + \theta = 1.64$. In this case, the number of paladins increases compared to panel (a) and $P \simeq U$. (d) Resources are allocated while $P > U$. (e), (f) $P>U$ while $h\tau+\theta = 1.64$ as in panel (c). } \label{fig:pop_dynamics} \end{centering} \end{figure} \noindent Since our game is constructed to evolve towards a final configuration where all players are either in subpopulation $P$ or $U$, we follow the time evolution of the number of players in these states. In Fig.\,\ref{fig:pop_dynamics} we show the dynamics of $P$ and $U$ as the game progresses for various choices of $h, \theta$ when $p_0 =0.1$ and $\tau= 2$. All curves are truncated at $r_{\rm last} \sim 80$, when $P+U =N$ and the game ends. In Figs.\,\ref{fig:pop_dynamics}(a) and (b) $h=0$ and no resources are utilized for rehabilitation programs. The punishment level is set to the low value $\theta = 0.04$ in panel (a), yielding a large number of unreformables, while for the higher punishment choice $\theta=0.8$ in panel (b) we find that the number of paladins exceeds that of unreformables $U$, as can be expected. In Figs.\,\ref{fig:pop_dynamics}(c) and (d) we keep the punishment levels equal to those used in panels (a) and (b) respectively and include the assignment of resources $h=0.8$ over an effective time $\tau=2$. As can be seen, these resources dramatically increase the dramatically the final number of paladins within our society. In Figs.\,\ref{fig:pop_dynamics}(e) and (f) we keep the same total amount of resources as in Fig.\,\ref{fig:pop_dynamics}(c), $h \tau + \theta = 1.64$, but use a different realization of the constraint: in panel (e) we allow for fewer resources $h=0.6, \tau = 2$ and more punishment $\theta = 0.44$ while in panel (f) we decrease the amount of resources even more, with $h=0.4, \tau =2$ and $\theta = 0.84$. Given the above constraint $h \tau + \theta = 1.64$, a comparison of panels (c), (e) and (f) shows that the relative number of paladins with respect to unreformables can be maximized by optimally modulating the parameter subset $\{h,\theta\}$. In particular, of the three panels (c), (e), (f) examined, the parameter choice in (e), with the optimal balance of punishment and rehabilitation efforts, is the most effective in yielding the largest $P/U$ ratio. We will later explore parameter space more in detail and study the $P/U$ ratio over a wider range of $\{h, \theta \}$ values. Finally, in all panels of \ref{fig:pop_dynamics}, we observe a slight delay in the increase in $U$ compared to the initial dynamics of $P$. This is due to the fact that player reform may occur starting from the beginning of the game, while for an individual to join the $U$ subpopulation he or she must have committed at least $R$ crimes. \subsection{Correlations between $p_0$ and $h$}\label{sec:hp0} In this subsection we investigate the role of $p_0$ on the final value of the ratio $P/U$. Since $p_0$ appears only in Eq.\,\ref{eq:pcrime}, and $p_{\rm crime}$ is an increasing function of $p_0$, we expect all results to be similarly increasing in this parameter. In Fig. \ref{fig:P_NR_h_p0}, we plot contours of $P/U$ as a function of $p_0$ and $h$ for $\tau =2$ and $\theta=0.1$. As expected, the $P/U$ ratio is increasing both in $p_0$ and $h$. In Fig.\,\ref{fig:P_NR_h_p0} we have also highlighted the $\{h,p_0\}$curve where the ratio $P/U = 1$. Note that for higher values of $p_0$, where $p_{\rm crime}$ is higher, more $h$ resources are needed to yield a final society comprised of equal numbers of paladins and unreformables. In this case, introducing the total resource constraint $h \tau + \theta = c$ is equivalent to selecting slices of Fig.\,\ref{fig:P_NR_h_p0} for fixed $h$ since $\tau=2$ and $\theta=0.1$ are set. The resulting trend is clear: for fixed $h$ better results are obtained on a low $p_0$ population, where the intrinsic probability to commit crimes is lower. All other quantities of interest yield similar monotonic trends -- namely, the crime, punishment and recidivism rates are decreasing functions of $\{h,p_0\}$ and we do not show them here. \begin{figure}[t] \begin{centering} \includegraphics[width=0.85\textwidth,trim=0.5in 0in 0.5in 0in,clip] %width=0.8\textwidth,trim = 0.25in 0 0in 0,clip] {plot_contour2_ratio_s5_P_NR_model06_h_p0_colorbar_PU1.pdf} \caption{Contours of the ratio $P/U$, as a function of $p_0$ and $h$ for $\theta=0.1$, and $\tau=2$. The plot is composed of a grid of 21$\times$21 points each corresponding to 400 individuals. The color scale is logarithmic. Note that $P/U$ is an increasing function of $p_0$ and $h$. The solid curve markes the locus $P=U$.} \label{fig:P_NR_h_p0} \end{centering} \end{figure} \subsection{Correlations between $\theta$ and $h$}\label{sec:hth} \begin{figure} \begin{centering} \includegraphics[width=0.95\textwidth,trim=0.5in 0.5in 1in 0.5in,clip]{plot_contour_crime_punish_recid_s5_h_tau2a1_2x2.pdf} \caption{Contours of the derived quantities for (a) the ratio $P/U$, (b) the crime rate, (c) the punishment rate and (d) the recidivism rate as a function of $h,\theta$ for $p_0=0.1$ and $\tau=2$.} \label{fig:stat_h_th} \end{centering} \end{figure} \begin{figure} \begin{centering} \includegraphics[width=0.95 \textwidth,trim=.45in .5in .45in .5in, clip]{plot_h_tau_theta3plot1_2x2.pdf} \caption{The $P/U$ ratio plotted as a function of $h$ under the constraint $h\tau+\theta=c$, where $c$ is a constant, for (a) $\tau=1$ (b) $\tau=1.5$ (c) $\tau=2$ and (d) $\tau=2.5$. The costant is chosen as $c=0.4,0.6,0.8$ so that three curves are shown for each each value of $\tau$. Each curve terminates at $\theta=0$. Panel (b) is projected from Fig.\,\ref{fig:stat_h_th}(a). } \label{fig:lin_comb_const} \end{centering} \end{figure} \begin{figure} \begin{centering} \includegraphics[width=0.85\textwidth]{P_U_plot_tau1_tau15_tau2a1_tau25_tau4a1.pdf} \caption{Curves along which $P/U=1$, for different values of $\tau$. For $\tau=2$, the curve is projected from Fig.\,\ref{fig:stat_h_th}(a). The curves all intersect at the same value of $\theta$ since when $h=0$ and no resources are assigned for rehabilitation programs $\tau$ does not play a role in the game. Note that the separatrix $P/U=1$ is lowest for $\tau=2$, implying that for given $h,\theta$ the best way to populate society with an equal amount of paladins and unreformable is by selecting an intermediate value for $\tau$. As explained in the text, intervention programs that are too brief or too long long yield less efficient results.} \label{fig:P_U_1} \end{centering} \end{figure} \begin{figure} \begin{centering} \includegraphics[width=0.95\textwidth,trim=0in 0.5in 0.5in 0.5in, clip]{contour_crime_punish_recid_p1_p2_1a.pdf} \caption{For $\tau=2$, and $p_0=0.1,0.2$: (a) Time rate of crime and (b) Time rate of punishment, which are normalized to the total number of rounds and the total number of individuals. The combination $h\tau+\theta=0.8$ is held fixed. (c) Recidivism Probability. The recidivism probability is normalized to the number of criminals.} \label{fig:stat_fix} \end{centering} \end{figure} In this subsection we study how all quantities of interest vary within the $\{h, \theta\}$ parameter space for $p_0=0.1$ and $\tau=2$. In Fig.\,\ref{fig:stat_h_th}(a) we show that the $P/U$ ratio is increasing with both $h,\theta$ while the crime, punishment and recidivism rates in Figs.\,\ref{fig:stat_h_th} respectively, are decreasing. These trends can be expected since increases in both rehabilitation and punishment tend to drive overall crime down. We now introduce the constraint $h \tau+ \theta = c$. In particular, in Fig.\,\ref{fig:lin_comb_const}(c), we show $P/U$ vs.$h$ on the locus $h\tau+\theta=c$ for $\theta=2, p_0=0.1$ to mirror the parameter choices in Fig.\,\ref{fig:stat_h_th}. The three curves are for the constant set at $c=0.8, 0.6, 0.4$, so that higher constants yield higher $P/U$ rates. The most interesting feature to arise from these curves is that optimal values of $h$ and $\theta = c - h \tau$ exist that yield maxima in the $P/U$ ratio. This implies, as seen before, that if law enforcement agencies have limited resources a proper balancing of punishment and rehabilitaion efforts may yield the best outcome in crime abatement. Furthermore, note that for low values of $h$, when $\theta$ is high, increasing the levels of rehabilitation $h$ is beneficial, but that beyond a certain threshold, when $h$ is too large and little punishment is assigned to criminals, $P/U$ starts decreasing. In Figs.\,\ref{fig:lin_comb_const}(a),(b) and (d) the same constraint is imposed for $\tau=1, 1.5$ and $2.5$ respectively. These curves show an initial quasi-plateau regime, where increasing $h$ -- and decreasing $\theta = c - h \tau$ -- does not appreciably change the $P/U$ ratio. However, increasing $h$ and decreasing $\theta$ further leads to decreases in $P/U$: just as in Fig.\,\ref{fig:stat_h_th}(c) sufficient punishment levels are necessary to keep $P/U >1$. Within the context of our model thus, we find that if rehabiliation efforts are either too short or too long-lived they may be ineffective: in the first case because they do not last long enough to affect the criminal decision process, in the second case because long intervention programs with finite resources necessarily imply that these programs are not impactful enough and will incur incremental effects. Our findings imply that the best approach to minimize the $P/U$ ratio is to punish the criminal adequately and then devote enough resources over a resonable period of time towards the criminal's rehabilitation. This trend is confirmed in Fig.\,\ref{fig:P_U_1}, where we plot contours corresponding to $P/U=1$, in $\{h,\theta\}$ space for various values of $\tau$ and for $p_0=0.1$. Note that rehabilitation programs lasting for intermediate times, $\tau = 2$, yield the lowest lying curves, indicating that equal numbers of paladins and unreformables can be attained for lower values of $h,\theta$ if $\tau$ is neither too large nor too small. In Fig. \ref{fig:stat_fix} we plot the time rate of crime, time rate of punishment, and the recidivism rate, while keeping the the combination of $h\tau+\theta=0.8$, for $\tau=2$, and values of $p_0=0.1,0.2$. In Fig. \ref{fig:stat_fix} (a) and (b), we observe that both the crime rate and the punishment rate decrease as the $h$ parameter is increased, as resources provided are augmented, but that there are diminishing returns past $h=0.25$. For $p=0.1$, we observe a non-monotonic decrease in Fig. \ref{fig:stat_fix} (a) and (b) due to the opposing effect of the number of rounds decreasing with $h$, which is the normalizing factor of these time rates of crime and punishment, while the $h$ parameter tends to decrease the number of crimes and also the number of punished crimes. The punishment rate generally decreases with $h$, but a slight uptick is noted past $h=0.25$, which is a feature arising from the non-linearity of the model. In Fig. \ref{fig:stat_fix} (c), we plot the recidivism probability while keeping again $h\tau+\theta=0.8$. Although the recidivism rate decreases with $h$, for $h>0.25$, there is marginal reduction in the recidivism probability. \begin{figure} \begin{centering} \includegraphics[width=0.95\textwidth,trim=0in 0.5in 0.5in 0.5in,clip]{plot_project_crime_punish_recid_s5_h_tau2a1_3x1.pdf} \caption{Projected contours of (a) crime rate, (b) punishment rate, and (c) recidivism rate, while keeping the value of $h\tau+\theta=0.4,0.6,0.8$, for $p_0=0.1$ and $\tau=2$.} \end{centering} \end{figure} \section{Discussion}\label{sec:Conclusions} We have proposed a model that accounts for the expected behavior of crime, punishment, and recidivism, within a game-theoretic framework. Our game accounts for changes of crime strategy and reform strategy, which evolve with time, individually for each player. We have also simulated the model in a Monte Carlo framework, and in \ref{sec:appendixODE}, we derive ODEs corresponding to the model. Increasing the magnitude of resources, the duration of allocation of resources, and the punishment severity are each individually correlated with achieving a utopian society and lower crime statistics; they are mutually correlated with achieving a utopian society and lower crime statistics as well. Increasing the $p_0$ parameter, which measures the inherent crime rate, is negatively correlated with achieving a utopian society and lower crime statistics. In terms of societal implications, punishment and resources applied together may have a synergistic effect in effecting a model society. On the other hand, low levels of punishment combined with low levels of resources lead to a dystopian society where uncurable criminals outnumber paladins. For the parameter space that we have investigated, we achieve realistic values for the recidivism probability. The recidivism probability that we have studied is related to rate of reoffending, $\lambda$, which has been extensively studied in the literature\cite{silence}, and we observe here from study of this model the deterrence effect from incentives, as embodied by the parameter $h$, $\tau$, and the severity of punishment, $\theta$. We do not observe relative minima or maxima in the 2-parameter space contours for $P/U$ or the crime statistics, \emph{except} for the important case where the sum of integrated resources and punishment, $h\tau+\theta$, is held fixed, where we do observe a peak in $P/U$ for $\tau=2$. This demonstrates that there are clearly optimal choices of allocation of resources and enforcement of punishment severity, as well as the duration for which the intervention is applied, which is sociologically significant from an economic point of view. While statistical methods have been heavily used in the quantitative study of recidivism\mcite{Farrington:1985, Pratt:2000}, the game theory approach is a novel contribution. The classification of different stages of criminality as marked by the pools $N_0, N_1, \cdots$ may be viewed as gradations in how hardened/experienced criminals are. This work may be viewed in the context of mandatory minimum sentencing policies such as three-strikes. We demonstrate that the provision of resources post--punishment may act as a disincentive to future criminal behavior, in consideration of the goal of offender reintegration. Although the model considers criminal history and the quantity of resources allocated, we do not distinguish other individual characteristics which may affect criminal behavior, such as gender, race, etc, i.e., we consider a homogeneous population at the outset of the game. In conclusion, we have demonstrated that allocating resources and the duration of resources provided are both factors that tend to rehabilitate criminals, as evidenced by the final ratio of model citizens to uncurable criminals, and the frequency of recidivist incidents, and overall crime and punishment incidents. We hope that this model may convince policy makers that incentives may be as effective in reducing crime as punishment alone. Considering the high rate of incarceration and associated costs, this may be an approach well worth examining. For a society with a given crime rate, the ``tragedy of the commons''\cite{Hardin:1968}, culminating in a society with overwhelming defecting strategy, may be avoided with a judicious choice of parameters describing punishment, allocation of resources, and duration of allocation of resources. In light of the 2012 Proposition 36, modifying the magnitude of the third strike offense, the will of the voters in California has demonstrated a new willingness to suspend harsh sentences in favor of reducing incarceration costs\cite{prop36}. In investigating this game-theoretic model, we did not consider changes in the arrest probability $\alpha$ parameter here, but this would tend to increase the number of $U$ at the end of the game, by changing the $p_{\rm reform}$ probability. Also, we did not consider the effects of what would happen if resources were not always allocated post-punishment, but if there was a probability for this intervention to occur. This would tend to introduce additional complexity and perhaps the need for additional parameters. The game-theoretic model could be modified with a condition, or probability, on the allocation of resources, which we considered. A larger sample of players, and a larger number of runs over which we average the game could be considered as well. Additional parameters could be introduced, such as the economic worth of the individuals, as studied in the literature\mcite{Arenasa:2011, Helbing:2010, Helbing2}, since economic considerations are known to be important in crime and reoffending. Monte Carlo simulation approaches are invaluable for solving problems that are analytically intractable, and have been used extensively for simulating game-theoretic models. A continuous time model for the game was not considered, which reflects the nature of crime and incident and arrest reporting in discrete units of time, such as weekly or monthly\cite{USBJ}. On the other hand, continous time Monte Carlo methods, such as kinetic Monte Carlo (KMC), could be applied, if the discreteness requirement for time distribution of crimes is relaxed. In addition, the incarceration period could be another parameter that we could vary and study in simulation experiments, and the interplay between jail sentence and resources and their duration could be examined. We could also consider an arbitrarily large number of punished crimes, $R$, to make the model more broadly applicable. %\appendix{ODEs Corresponding to the Model} \appendix \section{ODEs Corresponding to the Model}\label{sec:appendixODE} \include{ODE3} \eject \section*{Acknowledgment} This work was supported by the National Science Foundation through grant DMS-1021850 (MRD), and through the ARO MURI grant W911NF-11-1- 0332 (MRD) %\section*{References} \bibliography{paper_ref1} %\begin{comment} %\begin{thebibliography}{00} \end{document}