\documentclass{ws-m3as}
\usepackage{mcite}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{comment}
%\usepackage{subfigure}
\usepackage{caption}
\usepackage{subcaption}
%\usepackage[n]{natbib}
\usepackage{lineno}
\usepackage{url}
%\usepackage{biblatex}
%\usepackage{natbib}
\bibliographystyle{ws-m3as}
\graphicspath{{./images2/}}
\begin{document}
\linenumbers
\markboth{Bijan Berenji, Tom Chou, Maria R.D'Orsogna}{
 An Evolutionary Game theory for Recidivism and the Rehabilitation of Criminal Offenders}

%%%%%%%%%%%%%%%%%%% Publisher's Area please ignore %%%%%%%%%%%%%%%%%%%%%%%`/
%
\catchline{}{}{}{}{}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\title{An Evolutionary Game theory for Recidivism and the Rehabilitation of
Criminal Offenders}

\author{Bijan Berenji 
\footnote{Department of Biomathematics, University of California, Los Angeles, 
Los Angeles, California 90095-1766, USA}}

\address{Department of Biomathematics, University of California, Los Angeles \\
Los Angeles, California 90095-1766, USA \\
%\footnote{State completely without abbreviations, the
%affiliation and mailing address, including country. Typeset in 8 pt
%Times italic.}\\
bberenji@g.ucla.edu}

\author{Tom Chou}

\address{Departments of Biomathematics and Mathematics, University of California, Los Angeles \\
Los Angeles, California 90095-1766, USA\\
tomchou@ucla.edu}

\author{Maria R. D'Orsogna}

\address{
Department of Mathematics, California State University at Northridge, \\
Los Angeles, California 91330-1600, USA \\
dorsogna@csun.edu}

\maketitle

%\begin{history}
%\received{(Day Month Year)}
%\revised{(Day Month Year)}
%\accepted{(Day Month Year)}
%\comby{(xxxxxxxxxx)}
%\end{history}

\begin{abstract}
Motivated by recent attempts within some state criminal justice
systems to treat and rehabilitate non violent offenders rather than
focusing solely on their punishment, we introduce an evolutionary game
theoretic model to study the effects of such intervention programs on
criminal recidivism.  Within our game, we allow each player to commit
crimes depending on his or her own past history, on the environment to
which he or she is released into after having served a previous
sentence, and on any counseling, educational or training programs
available. Players may decide to permanently reform, or may continue
to engage in criminal activity, eventually reaching a state at which
they are considered incorrigible. Depending on parameter choices, the
outcome of the game is a society with a majority of reformed citizens
or of incorrigibles. Within the context of this model we find that
prolonged post-release assistance is an effective method in reducing
criminal offenses and the recidivism probability.  In addition,
assistance may reduce the need for increased punishment.  Sociological
implications of our results are discussed.
\end{abstract}

\keywords{crime recidivism; game theory; mathematical model}

\ccode{AMS Subject Classification: 22E46, 53C35, 57S20}

\section{Introduction}	

\noindent
The emergence of human cooperation is a subject of great interest
within the behavioral sciences. In recent years several studies have
tried to understand why such an exceptional level of cooperation among
humans exists despite the possibility of individual gains that may be
attained if people acted selfishly. Some of the current hypothesis to
explain large scale cooperation are based on reciprocity, altruistic
and tit--for--tat behaviors between two actors 
\mcite{Trivers:1971, Axelrod:1984, Fehr:2002, Fehr:2003}.  
One of the most endorsed
theories however includes third party punishment, where defectors are
sanctioned for following their self--serving interests 
\cite{Boyd:1992, Fehr:2004}.

Game theory has often been used as a tool to explore human or animal
behavior since its mathematical frameworks allow to study the dynamics
of players and their choices in a systematic, albeit simplified, way.
As a result, many authors within several disciplines have developed
and analyzed games that include the effects of punishment as a way to
foster cooperation among humans\mcite{Becker:1968, Nowak:2006}.  Most, but
not all, of these studies are based on the classic prisoner's dilemma
paradigm\cite{gameTheoryEcon} and include elements such as the
severity of sanctions and the willingness of participants to punish
offenders\cite{Helbing2}, the frequency and expectation of
enforcement\cite{Gordon:2009}, collective punishment and
rewards\cite{Heckathorn:1988} and the possibility of directly harming
adversaries\mcite{Arenasa:2011, DOrsogna:2010}. On the other hand, very
little work has focused on studying recidivism by offenders after
punishment and how prevention measures -- and not only punishment --
taken by third parties may improve recidivism rates and affect
cooperation.

% http://www.biomedsearch.com/article/Willingness-to-pay-rehabilitation-versus/260494126.html
%6485Microsoft PowerPoint -Willingness to Pay.pdf
%WILLINGNESSTOPAYFINAL.PDF
%http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2674351/

In this paper we focus on recidivism and rehabilitation within the
specific context of criminal behavior, where cooperators are law
abiding citizens and where defectors are criminals that may be
punished by incarceration if apprehended.  We introduce a evolutionary
game-theoretic model to study how player choices change over time not
only due to punishment after an offense, but also due to possible
post--punishment intervention given by third parties as prevention
against future crimes, in the form of housing, job, training or family
assistance.

In our ``carrot vs. stick'' game we start from non--offenders who are
progressively exposed to opportunities for crime and who, on every
occasion, may or may not violate the law. Within our model, the
probability for committing crimes is dependent on external factors,
such as the surrounding societal fabric or the threat of punishment,
and internal, such as the player's particular criminal
history.  Since we also assume that repeat offenders
are provided with assistance upon release, the probability to commit a
crime also depends on the quality and duration of any previously
assigned post--release assistance.  Finally, to incorporate the fact
that law enforcement agencies have limited resources, we assume that
the combination of punishment and post--release program costs per
incarceration are fixed: the more punishment a player is subject to,
the less post--release intervention assistance he or she will receive.

Players will thus progress in their criminal careers as recidivists,
until they are considered incorrigible, or may choose to shun their
criminal lives and become virtuous citizens. The rules of our game are
chosen so that an initial society will evolve towards a final
configuration comprised of a mixture of either incorrigibles or
virtuous citizens. We will analyze the ratio of the two final
populations as a function of relevant parameters and show that under
certain circumstances, post--release intervention programs, if
structured to be long lasting, may have important consequences on the
final societal makeup and be more effective than punishment alone. 
In particular, we will show that the ratio of
incorrigibles to virtuous citizens may be optimized by properly
balancing available resources between punishment and post--release
assistance.  From a mathematical standpoint our evolutionary game
theory will include history dependent strategies so that individuals
placed in the same circumstances may choose different courses of
action depending on their past criminal record.

The paper is organized as follows.  In Section \ref{sec:sociological}
we give a brief sociological introduction to the problem. In Section
\ref{sec:Model} we illustrate more in detail the rules of the game,
while in Section \ref{sec:Methods}, we describe how the model is
implemented via Monte Carlo methods.  In Section \ref{sec:Results} we
present our numerical results from as a function of model
parameters. Finally, we end in Section \ref{sec:Conclusions} with
a discussion of our findings and their sociological implications.


\section{Sociological background}\label{sec:sociological}

Starting from the 1970s, the severity of punishment for criminal
offenses in the United States has been steadily increasing, as
evidenced by growing incarceration rates, swelling prison populations,
longer sentencing and the increasing popularity of mandatory minimum
sentencing policies, such as ``three strikes''
laws\mcite{Tabarrok:2007, Zimring:2001}.  At present, the
country has the highest incarceration rate in the world, with about
one percent of the population imprisoned\cite{USBJ2010}. The cost
incurred by the taxpayer to fund the criminal justice system --
including day to day expenditures, facility maintenance and
construction, court proceedings, health care and welfare programs --
is estimated to be a staggering $\$74$ billion for 2007
alone\cite{USBJ}.  Related social problems include prison overcrowding
and violence, racial inequities, broken families left behind, and
releasing into the community individuals who have not been
rehabilitated during their prison time and are ill--equipped to lead a
crime free life after being released to the larger society.

One of the prevailing schools of thought is that the severity,
unpleasantness and social stigma of life in prison may serve as
deterrents to future criminal behavior, promoting the principle that
``crime does not pay'' \cite{Nagin:1998}.  
Opposing points of view contend that due to
the mostly poor conditions within prisons and lack of opportunities
for change, most inmates will be returned to society hardened and,
having been exposed to an environment dominated by more experienced
criminals, more savvy and likely to offend again. Indeed, several
criminological studies have shown that harsher sentences do not
necessarily act as deterrents and may even slightly increase the
likelihood of offending\mcite{Nagin:2009, Cullen:2011}. 
On the other hand, social
intervention and support as opposed to punishment and coercion alone
have been shown to be effective in preventing
crimes\mcite{Colvin:2002, Donohue:1998}.

Recidivism rates in the United States vary depending on crime.  In the
case of property and drug related offenses, the likelihood of rearrest
within three years after release is about 70 percent\cite{Nagin:2009},
higher than that of most Western countries. In recent years thus, due
to mounting incarceration costs and high recidivism rates, law
enforcement and correction agencies have begun turning to novel
approaches, designed to offer rehabilitation programs to prisoners
during incarceration and assistance upon release. Such programs
include counseling to increase self-restraint drug treatment,
vocational training, educational services, housing and job assistance,
community support, helping rekindle family ties, and even
horticulture\mcite{Cullen:2002, MacKenzie:2002}.  The success of these
programs is dependent upon a variety of factors and there is no one
size--fits--all mold.  The issue is a multifaceted one and for former
inmates, the question of whether or not to re-offend is a highly
individual one that depends on their personal histories\cite{Maruna:2004}, also known as trajectories\cite{Nagin:2009}, their
experiences while in jail, and the environment they are released
to\cite{Nagin:2009}.  In general, the most successful intervention programs
have been the ones that offered the most post--release
assistance\cite{Hallevy:2013}.

  
\section{The model\label{sec:Model}}

In this section we present the evolutionary game theory model we
developed as inspired by the sociological observations described above.  We
consider a population of $N$ individuals where each player carries his
or her specific history of $k=0,1,\dots $ offenses committed in the
past, whether punished or unpunished.  Thus, at any time we also have
finite sub-populations $N_0, N_1, \cdots, N_k$ of individuals
with a record of past $k \geq 0$ crimes.

We assume that when faced with the opportunity to commit a crime,
players may decide to offend and transition from state $N_k$ to
$N_{k+1}$, or may decline. In the latter case, they may either remain
in state $N_k$ or choose to shun criminal activity altogether, for any
and all future opportunistic criminal events.  We term these players
paladins, those who at any time during the game become virtuous and
choose to never re--offend again, regardless of record and
circumstances. Since paladin behavior is now fixed, we take these
individuals out of the game as active players and place them in the
subpopulation $P$. Note that the difference between paladins $P$
and players in the $N_0$ subpopulation is that a paladin may have
committed crimes in the past, but will not commit any crimes in the
future, whereas an individual belonging to $N_0$ has not committed any
crimes yet, but may in the future, if the occasion presents itself.

Upon committing crimes, players may or may not be arrested and
punished.  We assume that once a player has been arrested $R$ times,
he or she is considered incorrigible and incarcerated until the end of
the game, mimicking mandatory sentencing policies. Thus, after $R$
arrests players are also taken out of the game and placed into the
pool of unreformables $U$.  As a result, while players may transition
between states $N_k$, states $P$ and $U$ act as sinks with paladins
and unreformables not involved in the game as active participants, giving a
possible positive ($P$) or negative ($U$) imprint to society.
Finally, population conservation holds so that, at all times,

\begin{table}
\begin{tabular}{|l|l|}
%\begin{center}
\hline
$P$ & paladins \\ 
$U$ & unreformables (who have committed a maximum of $R$ crimes) \\
$N_0$ & neutral citizens with no committed crimes \\
$N_1$ & citizens with fewer than one punished crime \\
$N_2$ & citizens with fewer than two punished crimes \\
$N_3$ & citizens with fewer than three punished crimes \\
$k_u$ & number of unpunished crimes \\
$k_p$ & number of punished crimes \\
\hline
%MODEL PARAMETERS
$h$ & parameter quantifying resources \\
$\tau$ & duration of intervention \\
$\theta$ & severity of punishment \\
$p_0$ & punishment amplitude parameter \\
\hline
$R$ & maximum number of punished crimes \\
\hline
\end{tabular}
\end{table}


%\end{table}
%\end{tabular}

\begin{eqnarray}
P + \sum_{k} N_k + U = N.
\label{cons}
\end{eqnarray}

\noindent
Note that players may have committed $k > R$ crimes before
being arrested so that the summation over $N_k$ in Eq.\,\ref{cons}
is in principle unbounded.

For simplicity, we will consider an initial population of players with
no criminal history so that initial conditions are set as $N_0 = N$,
and $N_{k >0} = U = P = 0$.  We follow societal dynamics from the
neutral state $N_0$ towards subsequent states $N_{k>0}, U$ or $P$ by
assuming that when faced with the opportunity to commit a crime,
players will decide to offend or not based on past history,
apprehension likelihood, societal pressure, the threat of punishment
but also, in case of recidivists, on possible forms of rehabilitation
previously offered by society. As we shall later see, by
construction, the game will end when all players are either paladins
or unreformables, so that, eventually, $P+U = N$. A quantity of
interest throughout this work will thus be the $P/U$ ratio, which we
use as the final indicator of whether an ideal society is  
attained,
with $P/U \gg 1$, or whether instead a dysfunctional society emerges,
with $P/U \to 0$.

The game is played out in a succession of rounds $r$. At each of these
rounds, an individual $i$ is selected at random from any of the $N_k$
pools and is assigned a unitary payoff.  We assume the individual in
the group $N_k$ has a history of punished $k_p$ and unpunished $k_u$
crimes, so that $k = k_p + k_u$.  Committing a crime will augment the
player's payoff by a quantity $\delta$, while in the case of an
arrest, a punishment $\gamma$ will be subtracted from the payoff. For
simplicity we assume that $\gamma > \delta$ and introduce the
effective punishment $\theta = \gamma - \delta > 0$. We also assume
that every time a criminal was apprehended, he or she was not only
punished but also given educational and employment opportunities of
magnitude $h$ and with decay rate $\tau$ for rehabilitation purposes.
Since decisions made by an individual depend on past criminal record,
we assign each each player a history string containing punishment
status and round of crime occurrence.  We label each convicted crime
by $1$ and each unpunished crime by $0$.  For example, if a player is
in pool $N_3$ this implies there have been 3 crimes, committed at
rounds $r_{\ell}$ where $ 1 \leq \ell \leq 3$. If we assume, say, that
the first two crimes were left unpunished while the player was
punished for the last one, the history string associated with
individual $i$ is $(\{r_1, 0\},\{r_2, 0\}, \{r_3,1\})$.  In this
example $k_p=1$ and $k_u=2$.

Individual $i$ is now faced with the choice of whether to
commit a new crime or not. We assume this occurs with probability
$p_{\rm crime}$ given by 

\begin{eqnarray}
\hspace{-0.4cm}
p_{\rm crime} = \frac{(p_{i} + s_i)a_i}{2} = \frac 1 2 \left[\frac{p_0
    + k_u}{k_u + \theta k_p + p_0} + \frac{\sum_{k \neq 0} N_k}{N}\right] \left(1 - h
e^{-(r-r_{k, {\rm last}})/\tau}\right) 
\label{eq:pcrime}.
\end{eqnarray}

\noindent
We choose this form -- given by the sum of two terms, multiplied by an
attenuating factor -- to embody the assumption that individuals commit
crimes depending on their own personal history\cite{Maruna:2004}, represented by
$p_i$, and on the surrounding community imprint\cite{Surette:2002}, represented by
$s_i$, in equal manner. We assume that these two contributions are
independent of each other, yielding $(p_1+a_i)/2$. Given this crime
propensity, we assume that probability of committing a crime is
finally modulated by the recidivism probability, expressed by 
$a_i$, which includes any resources individual $i$ may have received in the
past.  Note that at the onset of the game when $N_k= k_u, k_p = 0$, the overall
probability to commit a crime is $1/2$, so that individuals are
equally likely to offend or not.  

We now examine the terms in
\ref{eq:pcrime} more in detail.  The first term $p_i$ is the
contribution to $p_{\rm crime}$ that strictly depends on 
the player's past history\cite{Maruna:2004} given by

\begin{eqnarray}
p_i = \frac{p_0 + k_u}{k_u + \theta k_p + p_0}.
\end{eqnarray}

\noindent
The form of $p_i$ is chosen such that previous unpunished crimes $k_u$
embolden the criminal, $p_i$ being an increasing function of $k_u$.
Similarly, previous punished crimes will hinder the likelihood of
future offenses, since $p_i$ is decreasing in $\theta k_p$. We
multiply $k_p$ by $\theta$ to represent the fact that the hindering
effect depends on the magnitude of the effective punishment and not only
on how many times the criminal was previously punished. 
If $\theta=0$ and there are no consequences for committing
crimes, $p_i =1$ and players will always inherently want to offend.
Also note that the probability of committing crimes for the first
time, when $k_u=k_p=0$ is one, similarly if the criminal was never
apprehended and punished, and $k_p=0$.  It is only when $\theta k_p >
1$ that $p_i < 1$. Finally, the term $p_0 $ represents the
``steepness" of the $p_i$ curve, so that the intrinsic crime
probability $p_i$ increases for larger values of $p_0$.


\vspace{0.5cm}
\noindent
The next term in Eq.\,\ref{eq:pcrime} is $s_i$, which represents 
a societal pressure term given by

\begin{eqnarray}
s_i = \frac{\sum_{k \neq 0} N_k}{N}.
\end{eqnarray}

\noindent
Including $s_i$ in $p_{\rm crime}$ allows us to incorporate the
assumption that crimes will generate more crimes, either by imitation,
or by observed degradation of the community. It is known 
that seeing or knowing about crimes may increase the 
likelihood of criminal behavior\cite{Surette:2002}. 
On the other hand, if the community is mostly
comprised of virtuous $P$ or neutral citizens $N_0$, the societal
pressure term is very small and so is the probability of committing
crimes. In the limit of $P \to N$, $s_i \to 0$. We include
individual $i$ in the enumeration of the $N_k$ subpopulations.

Finally, the sum $(p_i+s_i)/2$ is attenuated by the factor $a_i$ due
to societal intervention evaluated at the last round player $i$
committed a crime $r_k$ so that

\begin{eqnarray}
\label{ai}
a_i = (1 - h e^{-(r-r_{k, \rm{last}}) / \tau})
\end{eqnarray}

\noindent
where $r_{k {\rm last}}$ denotes the round number at which the last
punished crime occurred. This term represents intervention and help
from third parties, such as helping individual $i$ with employment,
education opportunities, or, in the case of youth, the support of a
mentor. We assume that these assistance programs will last over an
effective time $\tau$ and that the resource magnitude is $h$. Note,
from Eq.\,\ref{ai}, that if $\tau \ll r - r_{k, \rm{last}}$ and
rehabilitation programs are short lived, the exponent tends to zero,
$a_i$ approaches 1, and there is no attenuation effect. On the other
hand, if $\tau \gg r -r_k$, the attenuation is most effective at $1
-h$.  We assume $0 \leq h \leq 1$.  In principle, we could also let
both $h$ and $\tau$ depend on crime number $k_p$, but for simplicity
we will keep them constant for the remainder of this work.

After player $i$ is faced with the opportunity to commit a crime, the
game proceeds depending on the choices made.  If the crime was not
committed, the game proceeds to the strategy change phase; otherwise
an apprehension and punishment phase play out. We assume that the
apprehension and punishment probability is $\alpha$ and that every
time a criminal is arrested by default resources $h,\tau$ will be
given, regardless of the criminal's past history. The player's payoff
is now $1+\delta$ if he or she was not apprehended after having
committed the crime, otherwise, in case of an arrest and punishment,
the payoff is $1 + \delta - \gamma = 1 -\theta$.

The final step of the game is for player $i$ in population $N_k$ to
update his or her strategy. We assume that if the player's payoff
remains unitary -- due to no crimes having been committed -- he or she  
will proceed to the paladin pool $P$ with probability

\begin{equation} 
p_{\rm reform} = \frac {\alpha P} {N},
\end{equation}

\noindent
or remain in the current subpopulation $N_k$ with probability $1 -
p_{\rm reform}$.  The underlying idea here is that we assume that
player $i$ will commit to turning his or life around after having been
``tempted'' and not having caved in to crime. We assume this decision
depends on societal imprint expressed by the proportion of virtuous
citizens, $P/N$ and modulated by $\alpha$, the probability of an
arrest.

If the player's payoff is $1 + \delta$ and the player committed a
crime but was not apprehended, player $i$ moves from pool $N_k$ to
pool $N_{k+1}$ with probability 1. In this case, since there were no
consequences for having committed crimes, we assume players likewise
have no incentives not to commit criminal actions in the future.  The
last case is when the player's payoff is $1 - \theta$. Here, a crime
was committed, the criminal was apprehended and resources were
assigned.  In this case, we assume that the criminal decides to turn
into a law-abiding citizen and join the paladin pool $P$ 
via the probability

\begin{eqnarray}
\label{preform}
p_{\rm reform}= \frac 1 2 \left[\frac{h \alpha P}{N} + \frac{\theta
    k_p}{\theta k_p+k_u + p_0} \right],
\end{eqnarray}

\noindent
while he or she will join the population $N_{k+1}$ with probability
$(1 - p_{\rm reform})$. In Eq.\,\ref{preform} we assume that the reform
probability depends both on societal imprint and on the player's
punishment history. In particular, if no resources or punishment
are offered and both $h= \theta =0$ there is no incentive for players
to reform. Note that $p_{\rm reform} \leq 1$.

Finally, we assume that when players are arrested $R$ times they are
considered incorrigible and are sentenced to lengthy incarceration
periods that effectively take them out of the game and into the
unreformable pool $U$. They act only as bystanders and yield a
negative imprint to society, just as paladins do but in a positive
manner. By construction, our game will end when all players are either
in subpopulation $P$ or $U$. A majority of paladins represents a
desireable,``utopian'' society and viceversa, a majority of
unreformables an undesireable, ``dystopian'' one.  

To summarize, the parameter space associated with our model consists of
five parameters $\{h,\tau,\theta, p_0,\alpha\}$. However, consistent with police
estimates\cite{USBJarrest},
%\footnote{Statistic quoted for aggravated assault.}
we set the apprehension and punishment rate $\alpha = 1/4$ 
so we only consider only the parameter set $\{h,\tau,\theta,p_0\}$. 
In this work we fix $R=3$ as the maximum number of
punished crimes before players join the pool of unreformables $U$.


\section{Methods\label{sec:Methods}}


While statistical methods have been routinely used 
in the quantitative study of crime\mcite{Farrington:1985, Pratt:2000}, 
game theory approaches are a relatively new contribution. 
On the other hand, there is a quite rich literature on 
Monte Carlo methods for simulating games that involve decision making
and strategy updating\cite{Kalos:2009}.  
In this work, we implement our criminal game as a C++ Monte Carlo simulation
where we track the behavior of each individual over the duration of the
game and where each round  is a discrete time step.  As mentioned in the
previous section, players are associated to a dynamic history string
that summarizes past crime and arrest occurrences and from which
transitions between possible subpopulations $N_k, P,U$ are evaluated
every time a decision process is involved.

At every round we select a random player within any of the $N_k$
subpopulations and present him or her with the opportunity to commit a
crime, evaluating $p_{\rm crime}$ and $p_{\rm reform}$ to inform decisions and
strategy updates.  We repeat this procedure for all $N - U - P$
players and update the resulting $N_k, P,U$ subpopulations only after
the decision process has been carried out for all players, consistent
with parallel--update discrete time Monte Carlo
methods\cite{Kalos:2009}.  We also calculate relevant crime,
punishment and recidivism statistics until the end of the game, when
all players are either in the $U$ or $P$ subpopulations. Finally, we
generate contours of the final ratio $P/U$ which describes how ideal
the outcome society for the chosen parameter set $\{h,\tau, \theta,p_0
\}$ is.

Within our work, the average crime rate is evaluated as the sum of
migrations between subpopulations $N_k \to N_{k+1}$ for $k=0,1,2,R-1$
per round, normalized by the total number of players $N$.  Similarly,
the average punishment rate is defined as the sum over increments of
$k_p$ per round normalized by $N$, while the average recidivist rate
is the sum of migrations between subpopulations $N_k \to N_{k+1}$ for
$k=1,2,R-1$ per round, normalized by the total number of criminals who
have been punished at least once\cite{Nagin:2009}.  In the next
Section, we investigate how all of the above quantities vary with the
model parameters $\{h,\theta, \tau, p_0\}$ for a set of 400
individuals.  To limit the phase space defined by our four parameter
model we limit $\tau$ and $p_0$ so that $\tau\le 6$ and $p_0 \leq
0.2$.  The other parameters $h, \theta$ instead are between $0 \leq
h,\theta \leq 1$, which are limitations imposed by the model.  In
order to model the fact that law enforcement agencies may have limited
resources to both punish and rehabilitate a criminal, we introduce the
constraint $h \tau + \theta = c$, where $c$ is a constant.  Here $h
\tau \simeq h \int e^{-t/ \tau} dt $ represents the integrated quantity of
resources allocated by third parties over the duration of the
rehabilitation period, after the criminal is released to society while
$\theta$ is the direct punishment.  We will often invoke this constraint throughout the rest of this paper when examining the variation of derived quantities with respect to $h$.  


\section{Results}\label{sec:Results}

In this Section we show and discuss results from our Monte Carlo
simulations for different parameter choices. As discussed above, in
analyzing our data we will often invoke the resource constraint $h
\tau + \theta = c$.   In Sections \ref{sec:pop_dyn}, we discuss the Population Dynamics, Correlations between $p_0$ and $h$, Correlations between $\theta$ and $h$, respectively.  
  

\subsection{Population Dynamics\label{sec:pop_dyn}}

\begin{figure}[t]
\begin{centering}
\includegraphics[width=\textwidth,trim=0.25in 0.4 0.25in 0.4,clip]{plot_dyn1_asp}

\caption{Evolution of the number of paladins $P$ and unreformables $U$
  with respect to time for $p_0=0.1,\tau=2$ and variable $h,\theta$
  starting from a population of $N=N_0=400$ neutral citizens.  (a) No
  resources are allocated for rehabilitation purposes and punishment
  is low for the set of parameter choices: $h=0, \theta=0.4$. As expected $P \gg U$, where no resources
  are allocated.  (b) No resources are allocated for rehabilitation
  purposes and punishment is large: $h=0, \theta=0.8$. In this case due
  to the high punishment level a deterrence effect arises and $P
  \simeq U$.  (c) Resources are allocated while keeping punishment
  low, $h=0.8, \theta = 0.04$ yielding the total expenditure per crime
  $h \tau + \theta = 1.64$. In this case, the number of paladins
  increases compared to panel (a) and $P \simeq U$.  (d) Resources are
  allocated while $P > U$. (e), (f) $P>U$ while $h\tau+\theta = 1.64$ as in panel (c).  }
\label{fig:pop_dynamics}
\end{centering}
\end{figure}

\noindent
Since our game is constructed to evolve towards a final configuration
where all players are either in subpopulation $P$ or $U$, we follow
the time evolution of the number of players in these states.  In
Fig.\,\ref{fig:pop_dynamics} we show the dynamics of $P$ and $U$ as
the game progresses for various choices of $h, \theta$ when $p_0 =0.1$
and $\tau= 2$. All curves are truncated at $r_{\rm last} \sim 80$,
when $P+U =N$ and the game ends.  In Figs.\,\ref{fig:pop_dynamics}(a)
and (b) $h=0$ and no resources are utilized for rehabilitation
programs. The punishment level is set to the low value $\theta = 0.04$
in panel (a), yielding a large number of unreformables, while for the
higher punishment choice $\theta=0.8$ in panel (b) we find that the
number of paladins exceeds that of unreformables $U$, as can be
expected.  In Figs.\,\ref{fig:pop_dynamics}(c) and (d) we keep the
punishment levels equal to those used in panels (a) and (b)
respectively and include the assignment of resources $h=0.8$ over an
effective time $\tau=2$. As can be seen, these resources dramatically
increase the dramatically the final number of paladins within our
society.  In Figs.\,\ref{fig:pop_dynamics}(e) and (f) we keep the same
total amount of resources as in Fig.\,\ref{fig:pop_dynamics}(c), $h
\tau + \theta = 1.64$, but use a different realization of the
constraint: in panel (e) we allow for fewer resources $h=0.6, \tau = 2$
and more punishment $\theta = 0.44$ while in panel (f) we decrease the
amount of resources even more, with $h=0.4, \tau =2$ and $\theta =
0.84$.  Given the above constraint $h \tau + \theta = 1.64$, a
comparison of panels (c), (e) and (f) shows that the relative number
of paladins with respect to unreformables can be maximized by
optimally modulating the parameter subset $\{h,\theta\}$. In particular, of the three panels
(c), (e), (f) examined, the parameter choice in (e), with the optimal balance of
punishment and rehabilitation efforts, is the most effective in
yielding the largest $P/U$ ratio.  We will later explore parameter
space more in detail and study the $P/U$ ratio over a wider range of
$\{h, \theta \}$ values.

Finally, in all panels of \ref{fig:pop_dynamics}, we observe a slight delay in the
increase in $U$ compared to the initial dynamics of $P$. This is
due to the fact that player reform may occur starting from the
beginning of the game, while for an individual to join the $U$
subpopulation he or she must have committed at least $R$ crimes.


\subsection{Correlations between $p_0$ and $h$}\label{sec:hp0}

In this subsection we investigate the role of $p_0$ on the final value
of the ratio $P/U$.  Since $p_0$ appears only in Eq.\,\ref{eq:pcrime},
and $p_{\rm crime}$ is an increasing function of $p_0$, we expect all
results to be similarly increasing in this parameter.  In
Fig. \ref{fig:P_NR_h_p0}, we plot contours of $P/U$ as a function of
$p_0$ and $h$ for $\tau =2$ and $\theta=0.1$.  As expected, the $P/U$
ratio is increasing both in $p_0$ and $h$.  In
Fig.\,\ref{fig:P_NR_h_p0} we have also highlighted the
$\{h,p_0\}$curve where the ratio $P/U = 1$.  Note that for higher
values of $p_0$, where $p_{\rm crime}$ is higher, more $h$ resources
are needed to yield a final society comprised of equal numbers of
paladins and unreformables. In this case, introducing the total
resource constraint $h \tau + \theta = c$ is equivalent to
selecting slices of Fig.\,\ref{fig:P_NR_h_p0} for fixed $h$ since
$\tau=2$ and $\theta=0.1$ are set. The resulting trend is clear: for
fixed $h$ better results are obtained on a low $p_0$ population, where
the intrinsic probability to commit crimes is lower.  All other
quantities of interest yield similar monotonic trends -- namely, the
crime, punishment and recidivism rates are decreasing functions of
$\{h,p_0\}$ and we do not show them here.

\begin{figure}[t]
\begin{centering}
\includegraphics[width=0.85\textwidth,trim=0.5in 0in 0.5in 0in,clip]
%width=0.8\textwidth,trim = 0.25in 0 0in 0,clip]
{plot_contour2_ratio_s5_P_NR_model06_h_p0_colorbar_PU1.pdf}
\caption{Contours of the ratio $P/U$, as a function of $p_0$ and $h$
  for $\theta=0.1$, and $\tau=2$.  The plot is composed of a grid of
  21$\times$21 points each corresponding to 400 individuals.  The
  color scale is logarithmic. Note that $P/U$ is an increasing
  function of $p_0$ and $h$. The solid curve markes the locus $P=U$.}
\label{fig:P_NR_h_p0}
\end{centering}
\end{figure}


\subsection{Correlations between $\theta$ and $h$}\label{sec:hth}

\begin{figure}
\begin{centering}
\includegraphics[width=0.95\textwidth,trim=0.5in 0.5in 1in
  0.5in,clip]{plot_contour_crime_punish_recid_s5_h_tau2a1_2x2.pdf}
\caption{Contours of the derived quantities for (a) the ratio $P/U$,
  (b) the crime rate, (c) the punishment rate and (d) the 
  recidivism rate as a function of $h,\theta$ for $p_0=0.1$ and $\tau=2$.}
\label{fig:stat_h_th}
 \end{centering}
\end{figure}

\begin{figure}
\begin{centering}
\includegraphics[width=0.95 \textwidth,trim=.45in .5in .45in .5in,
  clip]{plot_h_tau_theta3plot1_2x2.pdf}
\caption{The $P/U$ ratio plotted as a function of $h$ 
  under the constraint $h\tau+\theta=c$, where $c$ is a constant, for
  (a) $\tau=1$ (b) $\tau=1.5$ (c) $\tau=2$ and (d) $\tau=2.5$.
  The costant is chosen as $c=0.4,0.6,0.8$ so that 
  three curves are shown for each each value of $\tau$.  Each curve
  terminates at $\theta=0$. Panel (b) is projected from 
  Fig.\,\ref{fig:stat_h_th}(a).
}
\label{fig:lin_comb_const}
\end{centering}
\end{figure}


\begin{figure}
\begin{centering}
\includegraphics[width=0.85\textwidth]{P_U_plot_tau1_tau15_tau2a1_tau25_tau4a1.pdf}
\caption{Curves along which $P/U=1$, for different values of $\tau$.
  For $\tau=2$, the curve is projected from
  Fig.\,\ref{fig:stat_h_th}(a).  The curves all intersect at the same
  value of $\theta$ since when $h=0$ and no resources are assigned for
  rehabilitation programs $\tau$ does not play a role in the
  game. Note that the separatrix $P/U=1$ is lowest for $\tau=2$,
  implying that for given $h,\theta$ the best way to populate society
  with an equal amount of paladins and unreformable is by selecting
  an intermediate value for $\tau$. As explained in the text,
  intervention programs that are too brief or too long 
  long yield less efficient results.}
\label{fig:P_U_1}
\end{centering}
\end{figure}

\begin{figure}
\begin{centering}
\includegraphics[width=0.95\textwidth,trim=0in 0.5in 0.5in 0.5in,
  clip]{contour_crime_punish_recid_p1_p2_1a.pdf}
\caption{For $\tau=2$, and $p_0=0.1,0.2$: (a) Time rate of crime and
  (b) Time rate of punishment, which are normalized to the total
  number of rounds and the total number of individuals.  The
  combination $h\tau+\theta=0.8$ is held fixed. (c) Recidivism
  Probability.  The recidivism probability is normalized to the number
  of criminals.}
\label{fig:stat_fix}
\end{centering}
\end{figure}

In this subsection we study how all quantities of interest vary within
the $\{h, \theta\}$ parameter space for $p_0=0.1$ and $\tau=2$.  In
Fig.\,\ref{fig:stat_h_th}(a) we show that the $P/U$ ratio is
increasing with both $h,\theta$ while the crime, punishment and
recidivism rates in Figs.\,\ref{fig:stat_h_th} respectively, are
decreasing. These trends can be expected since increases in both
rehabilitation and punishment tend to drive overall crime down.  We
now introduce the constraint $h \tau+ \theta = c$.  In
particular, in Fig.\,\ref{fig:lin_comb_const}(c), we show $P/U$ vs.$h$
 on the locus $h\tau+\theta=c$ for $\theta=2, p_0=0.1$ to mirror
the parameter choices in Fig.\,\ref{fig:stat_h_th}. The three curves
are for the constant set at $c=0.8, 0.6, 0.4$, so that higher constants
yield higher $P/U$ rates. The most interesting feature to arise from
these curves is that optimal values of $h$ and $\theta = c - h
\tau$ exist that yield maxima in the $P/U$ ratio. This implies, as
seen before, that if law enforcement agencies have limited resources a
proper balancing of punishment and rehabilitaion efforts may yield the
best outcome in crime abatement. Furthermore, note that for low values
of $h$, when $\theta$ is high, increasing the levels of rehabilitation
$h$ is beneficial, but that beyond a certain threshold, when $h$ is
too large and little punishment is assigned to criminals, $P/U$ starts
decreasing.  In Figs.\,\ref{fig:lin_comb_const}(a),(b) and (d) the
same constraint is imposed for $\tau=1, 1.5$ and $2.5$ respectively.
These curves show an initial quasi-plateau regime, where increasing
$h$ -- and decreasing $\theta = c - h \tau$ -- does not
appreciably change the $P/U$ ratio. However, increasing $h$ and
decreasing $\theta$ further leads to decreases in $P/U$: just as in
Fig.\,\ref{fig:stat_h_th}(c) sufficient punishment levels are necessary
to keep $P/U >1$.

Within the context of our model thus, we find that if rehabiliation
efforts are either too short or too long-lived they may be
ineffective: in the first case because they do not last long enough to
affect the criminal decision process, in the second case because long
intervention programs with finite resources necessarily imply that
these programs are not impactful enough and will incur incremental
effects. Our findings imply that the best approach to minimize the
$P/U$ ratio is to punish the criminal adequately and then devote enough
resources over a resonable period of time towards the criminal's
rehabilitation.

This trend is confirmed in Fig.\,\ref{fig:P_U_1}, where we plot
contours corresponding to $P/U=1$, in $\{h,\theta\}$ space for various
values of $\tau$ and for $p_0=0.1$. Note that rehabilitation programs
lasting for intermediate times, $\tau = 2$, yield the lowest lying
curves, indicating that equal numbers of paladins and unreformables
can be attained for lower values of $h,\theta$ if $\tau$ is neither
too large nor too small.

In Fig. \ref{fig:stat_fix} we plot the time rate of crime, time rate
of punishment, and the recidivism rate, while keeping the the
combination of $h\tau+\theta=0.8$, for $\tau=2$, and values of
$p_0=0.1,0.2$.  In Fig. \ref{fig:stat_fix} (a) and (b), we observe
that both the crime rate and the punishment rate decrease as the $h$
parameter is increased, as resources provided are augmented, but that
there are diminishing returns past $h=0.25$.  For $p=0.1$, we observe
a non-monotonic decrease in Fig. \ref{fig:stat_fix} (a) and (b) due to
the opposing effect of the number of rounds decreasing with $h$, which
is the normalizing factor of these time rates of crime and punishment,
while the $h$ parameter tends to decrease the number of crimes and
also the number of punished crimes. The punishment rate generally
decreases with $h$, but a slight uptick is noted past $h=0.25$, which
is a feature arising from the non-linearity of the model.  In
Fig. \ref{fig:stat_fix} (c), we plot the recidivism probability while
keeping again $h\tau+\theta=0.8$.  Although the recidivism rate
decreases with $h$, for $h>0.25$, there is marginal reduction in the
recidivism probability.
  

\begin{figure}
\begin{centering}
\includegraphics[width=0.95\textwidth,trim=0in 0.5in 0.5in 0.5in,clip]{plot_project_crime_punish_recid_s5_h_tau2a1_3x1.pdf}
\caption{Projected contours of (a) crime rate,  (b) punishment rate, and (c) recidivism rate, while keeping the value of $h\tau+\theta=0.4,0.6,0.8$, for $p_0=0.1$ and $\tau=2$.}
\end{centering}
\end{figure}


\section{Discussion}\label{sec:Conclusions}

We have proposed a model that accounts for the expected behavior of
crime, punishment, and recidivism, within a game-theoretic framework.
Our game accounts for changes of crime strategy and reform strategy,
which evolve with time, individually for each player.  We have also
simulated the model in a Monte Carlo framework, and in
\ref{sec:appendixODE}, we derive ODEs corresponding to the model.
Increasing the magnitude of resources, the duration of allocation of
resources, and the punishment severity are each individually
correlated with achieving a utopian society and lower crime
statistics; they are mutually correlated with achieving a utopian
society and lower crime statistics as well. Increasing the $p_0$
parameter, which measures the inherent crime rate, is negatively
correlated with achieving a utopian society and lower crime
statistics.  In terms of societal implications, punishment and
resources applied together may have a synergistic effect in effecting
a model society.  On the other hand, low levels of punishment combined
with low levels of resources lead to a dystopian society where
uncurable criminals outnumber paladins.


For the parameter space that we have investigated, we achieve
realistic values for the recidivism probability. The recidivism
probability that we have studied is related to rate of reoffending,
$\lambda$, which has been extensively studied in the
literature\cite{silence}, and we observe here from study of this model
the deterrence effect from incentives, as embodied by the parameter
$h$, $\tau$, and the severity of punishment, $\theta$.

We do not observe relative minima or maxima in the 2-parameter space
contours for $P/U$ or the crime statistics, \emph{except} for the
important case where the sum of integrated resources and punishment,
$h\tau+\theta$, is held fixed, where we do observe a peak in $P/U$ for
$\tau=2$.  This demonstrates that there are clearly optimal choices of
allocation of resources and enforcement of punishment severity, as
well as the duration for which the intervention is applied, which is
sociologically significant from an economic point of view.

While statistical methods have been heavily used in the quantitative
study of recidivism\mcite{Farrington:1985, Pratt:2000}, the game theory
approach is a novel contribution. The classification of different
stages of criminality as marked by the pools $N_0, N_1, \cdots$ may be
viewed as gradations in how hardened/experienced criminals are.

This work may be viewed in the context of mandatory minimum sentencing
policies such as three-strikes.  We demonstrate that the provision of
resources post--punishment may act as a disincentive to future
criminal behavior, in consideration of the goal of offender
reintegration.  Although the model considers criminal history and the
quantity of resources allocated, we do not distinguish other
individual characteristics which may affect criminal behavior, such as
gender, race, etc, i.e., we consider a homogeneous population at the
outset of the game.

	
In conclusion, we have demonstrated that allocating resources and the
duration of resources provided are both factors that tend to
rehabilitate criminals, as evidenced by the final ratio of model
citizens to uncurable criminals, and the frequency of recidivist
incidents, and overall crime and punishment incidents.  We hope that
this model may convince policy makers that incentives may be as
effective in reducing crime as punishment alone.  Considering the high
rate of incarceration and associated costs, this may be an approach
well worth examining.  For a society with a given crime rate, the
``tragedy of the commons''\cite{Hardin:1968}, culminating in a society
with overwhelming defecting strategy, may be avoided with a judicious
choice of parameters describing punishment, allocation of resources,
and duration of allocation of resources.  In light of the 2012
Proposition 36, modifying the magnitude of the third strike offense,
the will of the voters in California has demonstrated a new
willingness to suspend harsh sentences in favor of reducing
incarceration costs\cite{prop36}.

In investigating this game-theoretic model, we did not consider
changes in the arrest probability $\alpha$ parameter here, but this
would tend to increase the number of $U$ at the end of the game, by
changing the $p_{\rm reform}$ probability.  Also, we did not consider
the effects of what would happen if resources were not always
allocated post-punishment, but if there was a probability for this
intervention to occur.  This would tend to introduce additional
complexity and perhaps the need for additional parameters.  The
game-theoretic model could be modified with a condition, or
probability, on the allocation of resources, which we considered.  A
larger sample of players, and a larger number of runs over which we
average the game could be considered as well.  Additional parameters
could be introduced, such as the economic worth of the individuals, as
studied in the literature\mcite{Arenasa:2011, Helbing:2010, Helbing2}, since
economic considerations are known to be important in crime and
reoffending.  Monte Carlo simulation approaches are invaluable for
solving problems that are analytically intractable, and have been used
extensively for simulating game-theoretic models.  A continuous time
model for the game was not considered, which reflects the nature of
crime and incident and arrest reporting in discrete units of time,
such as weekly or monthly\cite{USBJ}.  On the other hand, continous
time Monte Carlo methods, such as kinetic Monte Carlo (KMC), could be
applied, if the discreteness requirement for time distribution of
crimes is relaxed.  In addition, the incarceration period could be
another parameter that we could vary and study in simulation
experiments, and the interplay between jail sentence and resources and
their duration could be examined.  We could also consider an
arbitrarily large number of punished crimes, $R$, to make the model
more broadly applicable.

%\appendix{ODEs Corresponding to the Model}

\appendix
\section{ODEs Corresponding to the Model}\label{sec:appendixODE}
\include{ODE3}

\eject


\section*{Acknowledgment}
This work was supported by the National Science Foundation through
grant DMS-1021850 (MRD), and through the ARO MURI grant W911NF-11-1-
0332 (MRD)

%\section*{References}

\bibliography{paper_ref1}
%\begin{comment}
%\begin{thebibliography}{00}

\end{document}