import React, { Component } from "react";
/* environments */
import Figure from "../../../components/Figure";
import Theorem from "../../../components/Theorem";
import Proof from "../../../components/Proof";

import Paper from "../../../media/paper_markov.pdf";

class Markov extends Component {
  componentDidMount() {
    window.KaTeXRender();
  }

  render() {
    return (
      <div class="postContent leftMargin">
        <p>
          Markov processes are indispensable in the theory and application of
          stochastic processes. They show up in nearly every empirical field,
          such as chemical kinetics, and theoretical ones as well, like
          mathematical finance. The reason for their ubiquity is that they carry
          a very simple assumption: that the way a system will change depends
          only on its state right now. This assumption turns out to be extremely
          accurate in describing many phenomenon while being tractable enough to
          lend itself to a rich mathematical theory.
        </p>
        <p>
          We will build up this theory by starting with Markov processes over
          metric spaces, and then naturally deriving the notion of an
          infinitesimal generator, allowing us to describe the process by a
          differential equation. Along the way, we will show this is not merely
          a description, however, but a characterization! Sufficiently nice
          infinitesimal generators yield quite nice Markov processes, known as
          Feller processes. These encapsulate a large number of common Markov
          processes, such as the Weiner process, which we will use as our
          concluding example, showing that the diffusion equation is its
          generator. This post is a rewriting of a{" "}
          <a className="linkPurple" href={Paper}>
            class paper
          </a>{" "}
          I wrote on the topic.
        </p>
        <p>
          Throughout we'll let {"\\((E,d)\\)"} be a metric space. Its standard
          topology is generated by its open balls. We will assume that{" "}
          {"\\(E\\)"} is locally compact. Let {"\\(B(E)\\)"} be the Borel{" "}
          {"\\(\\sigma\\)"}-algebra over
          {"\\(E\\)"}. Denote by {"\\(\\mathscr{B}(E)\\)"} the set of all{" "}
          <strong>measurable functions</strong>:
          {`\\[
	        \\mathscr{B}(E)=\\{f\\colon E\\to\\R : f^{-1}(V)\\in B(E)\\textrm{ for all open }V\\}.
          \\]`}
          Also let Let {"\\(\\mathscr{C}(E)\\)"} be all continuous functions{" "}
          {"\\(E\\to\\R\\)"}. We will focus, eventually, on the set of functions{" "}
          {"\\(\\mathscr{C}_0(E)\\subseteq\\mathscr{C}(E)\\)"} which{" "}
          <strong>vanish at infinity</strong>, meaning there is some{" "}
          {"\\(x_0\\in E\\)"} such that for any {"\\(\\varepsilon>0\\)"} we can
          find some {"\\(R\\)"} so {"\\(|f(x)|> \\varepsilon\\)"} whenever{" "}
          {"\\(d(x,x_0)> R\\)"}. They have a nice duality which lets us
          associate them with measures, and they optimize nicely.
        </p>
        <p>
          We declare that a generic Banach space will be{" "}
          {"\\((X,\\|\\cdot\\|)\\)"}, with {"\\(\\mathbb{C}\\)"} the underlying
          field and identity {"\\(\\mathbb{1}\\)"}. Note that{" "}
          {"\\(\\mathscr{B}(E),\\mathscr{C}(E)\\)"}, and{" "}
          {"\\(\\mathscr{C}_0(E)\\)"} are all Banach when endowed with the
          infinity-norm
          {`\\[
	\\|f\\|_{\\infty}=\\sup_{x\\in E} |f(x)|,
\\]`}
          and we will take this to be the implicit norm for all these spaces.
        </p>
        <Theorem statement="Let \(f\in\mathscr{C}_0(E)\). Then, there exists some \(x\in E\) such that \(|f(x)|=\|f\|\)." />
        <Proof
          proof="Suppose \(\|f\|>0\), otherwise any \(x\in E\) works. By definition of vanishing at infinity, we can find some \(x_0\in E\) and \(R>0\) such that \(|f(x)|\leq \|f\|/2\) whenever \(d(x,x_0)\geq R\). Therefore,
\[
	K=\{x\in E : \|f\|/2\leq x\leq\|f\|\}\subseteq \{x\in E: d(x,x_0)\leq  R\}.
\]
In particular, we find that \(K\) is compact. Due to continuity of \(f\), we know it attains its maximum on \(K\)."
        />
        <Theorem
          statement="The dual of \(\mathscr{C}_0(E)\) is all finite (real!) measures on \(B(E)\). Specifically, for such a measure \(\mu\) we have the form
\[
	\langle f,\mu\rangle = \int_{E} f(x)\,\mathrm{d}\mu(x).
\]"
          name="Riesz-Markov-Kakatuni"
        />
        <p>
          We will have to integrate many things. A generic measure space
          henceforth will be denoted by the triple {"\\((S,\\Sigma,\\mu)\\)"}{" "}
          (and just {"\\((S,\\Sigma)\\)"} will be our measurable space). When{" "}
          {"\\(S=\\R\\)"}, we will write {"\\(\\mathrm{d}x=\\mathrm{d}\\mu\\)"}{" "}
          for the Lebesgue measure. We'll then say a map{" "}
          {"\\(f\\colon (S,\\Sigma)\\to (S',\\Sigma')\\)"} between two
          measurable spaces is {"\\(\\Sigma\\)"}
          <strong>-measurable</strong> if {"\\(f^{-1}(A)\\in \\Sigma\\)"} for
          all {"\\(A\\in\\Sigma'\\)"}. Letting {"\\(f\\colon S\\to\\R\\)"} be a
          function measurable on the space {"\\((S,\\Sigma)\\)"}, with the set
          of all such functions being {"\\(M(S,\\Sigma)\\)"}, we define the
          standard spaces
          {`\\[
            L^p(S,\\Sigma,\\mu)=\\left\\{f\\in M(S,\\Sigma):x\\int_S |f|^p\\,\\mathrm{d}\\mu<\\infty\\right\\}.
          \\]`}
          for {"\\(1\\leq p<\\infty\\)"} and
          {`\\[
            L^\\infty(S,\\Sigma,\\mu)=\\left\\{f\\in M(S,\\Sigma):\\operatorname{ess\\ sup}_{x\\in S} f(x)<\\infty\\right\\}.
          \\]`}
        </p>
        <p>
          Above we have the standard Lebesgue integral, however we will need to
          occassionally make use of a slightly more general concept – the{" "}
          <strong>Bochner integral</strong>. For {"\\(i=1,\\dots,n\\)"} let{" "}
          {"\\(A_i\\in\\Sigma\\)"} be pairwise-disjoint sets of finite measure
          and take some {"\\(x_i\\in X\\)"}. Define {"\\(f\\colon S\\to X\\)"}{" "}
          by
          {`\\[
	f(x)=\\sum_{i=1}^n \\chi_{A_i}(x) x_i,
\\]`}
          a simple function. Here, {"\\(\\chi_{A_i}\\)"} are characteristic
          functions of {"\\(A_i\\)"}. Its integral is
          {`\\[
 	\\int_S f\\,\\mathrm{d}\\mu = \\sum_{i=1}^n \\mu(A_i) x_i.
\\]`}
          For a generic {"\\(f\\colon S\\to X\\)"}, if there exists some
          sequence of simple \(s_n\) such that
          {`\\[
	\\int_S \\|f-s_n\\|\\,\\mathrm{d}\\mu \\to 0
\\]`}
          then {"\\(f\\)"} is integrable and its integral is
          {`\\[
	 \\int_S f\\,\\mathrm{d}\\mu = \\lim_{n\\to\\infty} \\int_S s_n\\,\\mathrm{d}\\mu.
\\]`}
          This Bochner integral has no differences to the Lebesgue integral
          which will be material to us. In fact, we may even define the \(L^p\)
          spaces above for functions \(f\colon S\to X\) as well, provided we are
          slightly more careful about what it means for a function to be
          measurable with respect to a Banach space. In any case, the following
          theorem does apply with no changes.
        </p>
        <Theorem
          statement="Let \(f\colon \R\to X\) be integrable. Then,
\[
 	f(t)=\lim_{h\to 0} \frac{1}{h}\int_t^{t+h} f(x)\,\mathrm{d}x
\]
holds for almost all \(t\)."
          name="Lebesgue Differentiation"
        />
        <p>
          Let's quickly introduce some functional analysis. We will denote by{" "}
          {"\\(\\mathcal{L}(X)\\)"} bounded linear operators {"\\(X\\to X\\)"}{" "}
          (for we will never need the domain and codomain to vary). On{" "}
          {"\\(\\mathcal{L}(X)\\)"} we have the operator norm which makes it
          Banach:
          {`\\[
 	\\|T\\|_{\\textrm{op}} = \\sup_{0\\neq x\\in X} \\frac{\\|Tx\\|}{\\|x\\|}.
\\]`}
          The etymology of a bounded operator is precisely in that the operator
          norms are finite. Often, though, we define some norm and then restrict
          our interests to a set for which that norm is finite, excluding
          elements with infinite norms. We may do this, in some sense, for
          operators as well. The non-trivial hurdle is that if an operator has
          infinite norm, there is some point where it is discontinuous, and
          therefore cannot be linear, and so cannot have been an operator in the
          first place. We resolve this by defining an{" "}
          <strong>unbounded operator</strong> on a dense linear subspace{" "}
          {"\\(Y\\subseteq X\\)"} as a linear operator{" "}
          {"\\(T\\colon Y\\to X\\)"} (usually writing{" "}
          {"\\(Y=\\operatorname{dom}T\\)"}). We say {"\\(T\\)"} is{" "}
          <strong>closed</strong> if its graph
          {`\\[
	\\Gamma(T)=\\{(x,Tx):x\\in \\operatorname{dom}T\\}
\\]`}
          is closed. Of course, we could take {"\\(\\operatorname{dom}T=X\\)"}{" "}
          and then have an ordinary bounded operator present itself as a special
          case. Working with unbounded operators is slightly tricky, For
          instance, it is unclear in general if squaring makes sense since the
          operator may map outside of its own domain.
        </p>
        <p>
          For {"\\(T\\in\\mathcal{L}(X)\\)"} we define its{" "}
          <strong>spectrum</strong>
          {`\\[
	          \\sigma(T)=\\{\\lambda\\in\\mathbb{C} : T-\\lambda\\mathbb{1}\\not\\in\\mathcal{L}(X)\\}.
          \\]`}{" "}
          There are multiple reasons {"\\(T-\\lambda\\mathbb{1}\\)"} may fail to
          be a linear operator. The simplest one is when an eigenvalue exists,
          in which case the set-theoretic inverse simply does not exist. We may
          be tempted to define {"\\(\\sigma(T)\\)"} for an unbounded {"\\(T\\)"}{" "}
          in the same way. However, inversion is more subtle since that would be
          a map {"\\(X\\to\\operatorname{dom}T\\)"} and for that reason we first
          define its <strong>resolvent set</strong>
          {`\\[
	\\rho(T)=\\{\\lambda\\in\\mathbb{C} : R_{\\lambda}(T)=(T-\\lambda\\mathbb{1})^{-1}\\in\\mathcal{L}(X)\\}
\\]`}
          and then define the spectrum of {"\\(T\\)"} by{" "}
          {"\\(\\sigma(T)=\\mathbb{C}\\setminus\\rho(T)\\)"}.
        </p>
        <p>
          We're now reading to hike on the main trail. A probability measure is
          simply a measure space
          {"\\((\\Omega,\\Sigma,\\mathbb{P})\\)"} where{" "}
          {"\\(\\mathbb{P}(\\Omega)=1\\)"}. We will allow for{" "}
          <strong>metric-spaced valued random variables</strong>, functions{" "}
          {"\\(Y\\colon\\Omega\\to E\\)"} which are {"\\(\\Omega\\)"}
          -measurable. For Borel {"\\(A\\in B(E)\\)"} we use the notation
          {`\\[
\\mathbb{P}(Y\\in A)=\\mathbb{P}(Y^{-1}(A))
\\]`}
          and, if it exists, denote the <strong>expectation</strong> by
          {`\\[
 	\\mathbb{E}(Y)=\\int_{\\Omega} Y\\,\\mathrm{d}\\mathbb{P}.
\\]`}
          Now, given a sub-{"\\(\\sigma\\)"}-algebra{" "}
          {"\\(\\tilde{\\Sigma}\\subseteq\\Sigma\\)"} we define the{" "}
          <strong>conditional expectation</strong> of {"\\(Y\\)"} given{" "}
          {"\\(\\tilde{\\Sigma}\\)"} as the unique {"\\(\\tilde{\\Sigma}\\)"}
          -measurable random variable{" "}
          {"\\(\\mathbb{E}( Y\\,|\\,\\tilde{\\Sigma})\\)"} such that
          {`\\[
	\\int_A Y\\,\\mathrm{d}\\mathbb{P}=\\int_A \\mathbb{E}( \\,|\\,\\tilde{\\Sigma})\\,\\mathrm{d}\\mathbb{P}
\\]`}
          for all {"\\(A\\in\\tilde{\\Sigma}\\)"} (and the astute reader will
          notice we should actually state this as a theorem, rather than a
          definition). Later, we will be conditioning on various sub-
          {"\\(\\sigma\\)"}-algebra, which probabilists endow with the fancy
          phrasiology of "adapting to filtrations".
        </p>
        <p>
          A key space in the study of Markov processes is that of trajectories.
          The approach may seem slightly backward at first, but here is the
          idea: we can begin with some deterministic path, eating in time and
          regurgitating a location, and then at each point in our space, look at
          the probability that some path has led us here. The space of paths (or{" "}
          <strong>trajectories</strong>) is the <strong>Skorokhod space</strong>
          {`\\[D_E[0,\\infty)=\\{\\gamma\\colon [0,\\infty)\\to E : \\gamma\\textrm{ is càdlàg}\\},\\]`}
          where càdlàg (continue à droite, limite à gauche) means to be
          right-continuous with left limits. We also define the{" "}
          <strong>time-mapping functions</strong>{" "}
          {"\\(\\pi_t(\\gamma)=\\gamma(t)\\)"} on trajectories, with{" "}
          {"\\(t\\geq 0\\)"}. Let us also write {"\\(\\mathscr{F}\\)"} for the
          (smallest!) {"\\(\\sigma\\)"}-algebra of {"\\(D_E[0,\\infty)\\)"} such
          that all
          {"\\(\\pi_t\\)"} are measurable, and {"\\(\\mathscr{F}_t\\)"} such
          that all
          {"\\(\\pi_s\\)"} for {"\\(s\\leq t\\)"} are measurable. Equipped
          together, the measurable space {"\\((D_E[0,\\infty),\\mathscr{F})\\)"}{" "}
          is the <strong>space of all paths</strong>.
        </p>
        <p>
          At last we are ready to define a <strong>Markov process</strong>,
          which is a family of probability measures{" "}
          {"\\((\\mathbb{P}_x)_{x\\in E}\\)"} on{" "}
          {"\\((D_E[0,\\infty),\\mathscr{F})\\)"} such that:
        </p>
        <ol>
          <li>
            for all {"\\(x\\in E\\)"} we have{" "}
            {"\\(\\mathbb{P}_x(\\gamma(0)=x)=1\\)"};
          </li>
          <li>
            the map {"\\(x\\mapsto \\mathbb{P}_x(F)\\)"} is measurable for all{" "}
            {"\\(F\\in\\mathscr{F}\\)"}; and
          </li>
          <li>
            for all {"\\(s\\geq 0\\)"},{" "}
            {
              "\\(\\mathbb{P}_x(\\gamma(t+s)\\in A\\,|\\,\\mathscr{F}_t)=\\mathbb{P}_{\\gamma(t)}(\\gamma(s)\\in A)\\)"
            }
            , given any {"\\(t\\geq 0\\)"} and {"\\(A\\in B(E)\\)"}
          </li>
        </ol>
        <p>
          The last property is the most important, and is called the{" "}
          <strong>Markov property</strong>. Note that this "stochastic measure"
          is interpreted in the sense of {"\\(\\mathbb{P}_x\\)"}-almost sure
          equality on {"\\(\\mathscr{F}_t\\)"}.
        </p>
        <p>
          In the time-based approach we feed in a time fixed {"\\(t\\)"} and
          have probabilities assigned to all points in space {"\\(x\\)"}. In
          this space-based approach, our process looks at a fixed point in space{" "}
          {"\\(x\\)"} and for each trajectory {"\\(\\gamma\\)"}, a deterministic
          mapping {"\\(t\\mapsto x\\)"}, return the probability such a
          trajectory will be realized under {"\\(\\mathbb{P}_x\\)"}. In this
          sense, our index {"\\(x\\)"} behaves like an initial condition.
        </p>
        <p>
          Given some Markov process {"\\((\\mathbb{P}_x)_{x\\in E}\\)"} we now
          define
          {`\\[
	\\mathbb{E}_x(\\Phi)=\\int_{D_E[0,\\infty)} \\Phi\\,\\mathrm{d}\\mathbb{P}_x
\\]`}
          for {"\\(\\Phi\\in L^1(D_E[0,\\infty),\\mathscr{F},\\mathbb{P}_x)\\)"}
          . As a special case, we may fix some {"\\(t\\geq 0\\)"} and take{" "}
          {"\\(f\\in\\mathscr{B}(E)\\)"}. Then, the map
          {`\\[
	\\Phi\\colon D_E[0,\\infty)\\to\\R\\textrm{ by }\\gamma\\mapsto f(\\gamma(t))
\\]`}
          is measurable with respect to {"\\(\\mathscr{F}\\)"} and
          absolutely-integrable with respect to any {"\\(\\mathbb{P}_x\\)"}.
          Indeed, for any {"\\(V\\in B(\\R)\\)"} we have
          {`\\[
	\\Phi^{-1}(V)=\\{\\gamma\\in D_E[0,\\infty): f(\\gamma(t))\\in V\\}=\\{\\gamma\\in D_E[0,\\infty): \\pi_t(\\gamma)\\in f^{-1}(V)\\}.
\\]`}
          Since {"\\(f\\)"} is measurable we know {"\\(f^{-1}(V)\\in B(E)\\)"},
          and {"\\(\\mathscr{F}\\)"} is defined such that {"\\(\\pi_t\\)"} is
          measurable. Integrability follows from the boundedness of {"\\(f\\)"}.
          In fact, such a {"\\(\\Phi\\)"} is not just {"\\(L^1\\)"}, but is even{" "}
          {"\\(L^\\infty\\)"}, which is important in the context of the
          following characterization of the Markov property.
        </p>
        <Theorem
          statement="Let \(s,t\geq 0\). Then, for any \(f\in\mathscr{B}(E)\) we have
\[
	\mathbb{E}_x(f(\gamma(t+s)) \,|\, \mathscr{F}_t)=\mathbb{E}_{\gamma(t)}(f(\gamma(s))).
\]"
        />
        <Proof
          proof="With proceed with the standard machinery. For some \(A\in B(E)\) we consider the characteristic function \(f\equiv\chi_A\). For for arbitrary \(F\in\mathscr{F}_t\), by definition of conditional expectation we have
\[
	\int_F\mathbb{E}_x(f(\gamma(t+s))\,|\,\mathscr{F}_t)\,\mathrm{d}\mathbb{P}_x(\gamma)=\int_{F}\chi_A(\gamma(t+s))\,\mathrm{d}\mathbb{P}_x(\gamma).
\]
Since \(F\) was arbitrary, we have concluded
\[
	\mathbb{E}_x(\Phi(\gamma(t+s))\,|\,\mathscr{F}_t)=\mathbb{P}_x(\gamma(t+s)\in A\,|\,\mathscr{F}_t).
\]
Using the Markov property, we get that
\[
	\mathbb{P}_x(\gamma(t+s)\in A\,|\,\mathscr{F}_t)=\mathbb{P}_{\gamma(t)}(\gamma(s)\in A)=\mathbb{E}_{\gamma(t)}(\Phi(\gamma(s))).
\]
For a simple function \(f\equiv \sum_{i=1}^n a_i\chi_{A_i}\), with \(a_i\in\R\) and \(A_i\in B(E)\) pairwise disjoint, we see
\[
\int_F\mathbb{E}_x(f(\gamma(t+s))\,|\,\mathscr{F}_t)\,\mathbb{P}_x(\gamma)=\sum_{i=1}^na_i\int_{F}\chi_{A_i}(\gamma(t+s))\,\mathrm{d}\mathbb{P}_x(\gamma).
\]
Once more, since \(F\) was arbitrary, we conclude
\[
	\mathbb{E}_x(f(\gamma(t+s))\,|\,\mathscr{F}_t)=\sum_{i=1}^n a_i\mathbb{P}_x(\gamma(t+s)\in A_i\,|\,\mathscr{F}_t)=\sum_{i=1}^n a_i\mathbb{P}_{\gamma(t)}(\gamma(s)\in A_i)=\sum_{i=1}^\infty a_i\mathbb{E}_{\gamma(t)}(\chi_{A_i}(\gamma(s))).
\]
Via linearity of integration, we conclude the above is indeed \(\mathbb{E}_{\gamma(t)}(f(\gamma(s)))\). For non-negative \(f\), we approximate via simple functions, and so on (we will see an example of this procedure momentarily)."
        />
        <p>
          In fact, the above may even be taken as an alternative definition of
          the Markov property. It is generally easier to use, since we will now
          define a function central in analysing Markov processes in terms of
          this integral {"\\(\\mathbb{E}_x\\)"}: the{" "}
          <strong>Markov semigroup</strong>. This is the family{" "}
          {"\\((S_t)_{t\\geq 0}\\)"} associated to{" "}
          {"\\((\\mathbb{P}_x)_{x\\in E}\\)"}, where{" "}
          {`\\[
	S_t\\colon\\mathscr{B}(E)\\to\\mathscr{B}(E)\\textrm{ by }S_tf(x)=\\mathbb{E}_x(f(\\gamma(t)).
\\]`}{" "}
          Some parts of this definition may be contentious, such as whether{" "}
          {"\\(S_tf\\)"} really does belong to {"\\(\\mathscr{B}(E)\\)"}, which
          we now resolve.
        </p>
        <Theorem
          statement="Let \((S_t)_{t\geq 0}\) be the Markov semigroup associated to a Markov process \((P_x)_{x\in E}\). Let \(t,s\geq 0\) and \(f,g\in\mathscr{B}(E)\). Then, \(S_t\) is linear and bounded, in particular with \(\|S_tf\|_{\infty}\leq \|f\|_{\infty}\). We also have almost everywhere that
\(S_0\equiv\mathbb{1}\), \(S_t1\equiv 1\), and \(S_tf\leq S_tg\) whenever \(f\leq g\). Lastly, we have the semigroup property: \(S_{t+s}\equiv S_t\circ S_s\)."
        />
        <Proof
          proof="Let us start by showing \(S_tf\in\mathscr{B}(E)\) for \(f\in\mathscr{B}(E)\). If \(f\equiv\chi_A\) for some \(A\in B(E)\), then
\[
	S_tf(x)=\int_{D_E[0,\infty)} \chi_A(\gamma(t))\,\mathrm{d}\mathbb{P}_x(\gamma)=\mathbb{P}_x(\gamma(t)\in A)=\mathbb{P}_x(\pi_t(\gamma)\in A).
\]
By definition of our \(\sigma\)-algebra \(\mathscr{F}\), we know \(\pi_t^{-1}(A)\in\mathscr{F}\). So, for any Borel \(V\in B(\R)\) we have
\[
	S_tf^{-1}(V)=\{x\in E: \mathbb{P}_x(\pi_t(\gamma)\in A)\in V\}=\{x\in E:\mathbb{P}_x(\pi_t^{-1}(A))\in V\},
\]
which is clearly measurable since each \(\mathbb{P}_x\) is a measure on the same \(\sigma\)-algebra \(\mathscr{F}\).

Now, suppose \(f\in\mathscr{B}(E)\) is non-negative. So, it may be approximated by some (pointwise!) convergent monotonic positive sequence \((s_n)\) of simple functions. We have
\[
	S_ts_n(x)=\sum_{i=1}^n a_{i}^{(n)}\mathbb{P}_x\left(\pi_t(\gamma)\in A_{i}^{(n)}\right),
\]
where \(s_n=\sum_{i=1}^n a_{i}^{(n)}\chi_{A_{i}^{(n)}}\). Indeed, these too are measurable, showing that \(S_ts\) is measurable for any simple \(s\). Moving onward, since \(s_n\to f\) we have
\[
	S_t\circ (s_n-f)(x)=\int_{D_E[0,\infty)} (s_n-f)(\gamma(t))\,\mathrm{d}\mathbb{P}_x(\gamma)\to 0.
\]
This tells us \(S_ts_n\to S_tf\), and so \(S_tf\) itself must be measurable. This immediately extends to the mixed-sign case.

Linearity is clear by the fact that \(S_t\) is an integral. To get our bound, witness that
\[
	|S_tf(x)| = \left|\int_{D_E[0,\infty)} f(\gamma(t))\,\mathrm{d}\mathbb{P}_x(\gamma) \right|\leq \int_{D_E[0,\infty)} |f(\gamma(t))|\,\mathrm{d}\mathbb{P}_x(\gamma)\leq \|f\|_{\infty}.
\]
This holds for all \(x\in E\) (and really, we should sub-index our infinity norm by \(x\), as this defines our \(L^\infty\) space).

The fact \(S_t1 \equiv 1\) follows from the definition of a probability measure. For \(S_0\), we recall that \(\mathbb{P}_x(\gamma(0)=x)=1\) for all trajectories \(\gamma\) and points \(x\).

We conclude with the semigroup property. For an arbitrary \(f\in\mathscr{B}(E)\) and \(t,s\geq 0\) we have
\[
	S_{t+s}f(x)=\mathbb{E}_x(f(\gamma(t+s)))=\mathbb{E}_x\left[\mathbb{E}_x(f(\gamma(t+s))\,|\,\mathscr{F}_t)\right],
\]
by the law of total expectation. Applying our previous work, we have
\[
\begin{aligned}
	\mathbb{E}_x\left[\mathbb{E}_x(f(\gamma(t+s))\,|\,\mathscr{F}_t)\right] &=\mathbb{E}_x\left[\mathbb{E}_{\gamma(t)}(f(\gamma(s)))\right]\\
	&=\mathbb{E}_x\left[ S_sf(\gamma(t))\right] \\
	&= S_t\left(S_sf\right)(x)\\
	&=(S_t\circ S_s)(f)(x),
\end{aligned}
\]
through two applications of the definition of the semigroup property."
        />
        <p>
          So, the semigroup allows you to take a function {"\\(f\\)"} on our
          state space {"\\(E\\)"}, and evolve it forward by time {"\\(t\\)"}{" "}
          given some initial condition {"\\(x\\)"}. This is done in the sense of
          averaging {"\\(f\\)"} over the distribution of possible trajectories{" "}
          {"\\(\\gamma\\)"} over {"\\(E\\)"}.
        </p>
        <p>
          Though bounded measurable functions {"\\(\\mathscr{B}(E)\\)"} are not
          the worst, they certainly are not the best. They are far too general a
          class to contain, and so we instead wish to return to{" "}
          {"\\(\\mathscr{C}_0(E)\\)"}, those vanishing at infinity. We then call
          a Markov semigroup {"\\((S_t)_{t\\geq 0}\\)"} <strong>Feller</strong>{" "}
          if {"\\(S_tf\\in\\mathscr{C}_0(E)\\)"} whenever{" "}
          {"\\(f\\in\\mathscr{C}_0(E)\\)"}. Moreover, if for any sequence{" "}
          {"\\(\\mathscr{C}_0(E)\\ni f_n\\to 1\\)"} we have{" "}
          {"\\(S_tf_n\\to 1\\)"} for all {"\\(t\\)"}, then we say the semigroup
          is <strong>conservative</strong>.
        </p>
        <Theorem statement="If \((S_t)_{t\geq 0}\) is Feller, then it is strongly continuous on \(\mathscr{C}_0(E)\), in the sense that for any \(f\in\mathscr{C}_0(E)\) the map \(t\mapsto S_tf\) is a continuous map \([0,\infty)\to\mathscr{C}_0(E)\)." />
        <p>
          Vanishing at infinity really is crucial for the above theorem. This is
          as we make use of approximations that are dense only in{" "}
          {"\\(\\mathscr{C}_0(E)\\)"}, and not for continuous functions in
          general.
        </p>
        <p>
          In the previous section, we showed how we may start with any Markov
          process and define a semigroup, explicitly. Now, although we have
          introduced this Feller property, the burden remains to show it is
          actually realizable. This is not only true, but is so in a miraculous
          way. It turns out that given some Feller semigroup (abstractly, this
          is just some set of time-indexed linear operators{" "}
          {"\\(\\mathscr{C}_0(E)\\to\\mathscr{C}_0(E)\\)"} with some conditions,
          like strong continuity and the semigroup property), we can recover the
          associated Markov process. This is, in fact, the{" "}
          <strong>Chapman-Kolmogorov master equation</strong>.
        </p>
        <Theorem
          statement="Let \((S_t)_{t\geq 0}\) be a conservative Feller semigroup, and let \(s,t\geq 0\) and \(x\in E\) be arbitrary. Then, there exists a probability measure \(\mathbb{P}_t(x,\cdot)\) on \(B(E)\) such that
\[
	S_tf(x)\int_E f(y)\,\mathbb{P}_t(x,\mathrm{d}y)
\]
for all \(f\in\mathscr{C}_0(E)\). 

Moreover, fix some \(A\in B(E)\). Then, the map \(x\mapsto \mathbb{P}_t(x,F)\) belongs to \(\mathscr{B}(E)\). We also have the Chapman-Kolmogorov equation:
\[
	\mathbb{P}_{s+t}(x,A)=\int_E \mathbb{P}_s(y,A)\,\mathbb{P}_t(x,\mathrm{d}y).
\]"
        />
        <Proof
          proof="Fix \(t\geq 0\) and \(x\in E\). Define the functional \(f\mapsto S_tf(x)\) on \(\mathscr{C}_0(E)\). Recall we can find a (real!) measure \(\mathbb{P}_t(x,\cdot)\) such that 
\[
	S_tf(x)=\langle \mathbb{P}_t(x,\cdot),f\rangle=\int_E f(y)\,\mathbb{P}_t(x,\mathrm{d}y).
\]
By taking \(f\equiv\chi_A\) for any \(A\in B(E)\), we have that \(f\geq 0\) and so \(S_tf\geq 0\), and so we must have that \(\mathbb{P}_t(x,\cdot)\geq 0\). Take now any sequence \(f_n\to 1\), and observe that
\[
	\mathbb{P}_t(x,E)=\int_E \mathbb{P}_t(x,\mathrm{d}y)=\lim_{n\to\infty}\int_E f_n\,\mathbb{P}_t(x,\mathrm{d}y)=\lim_{n\to\infty} S_tf_n(x)=1
\]
by conservativity and dominated convergence. In all, we see that our measure \(\mathbb{P}_t(x,\cdot)\) is in fact a probability measure. 

For the Chapman-Kolmogorov equation, take \(s,t\geq 0\). For all \(f\in\mathscr{C}_0(E)\) witness that
\[
	\int_E f(y)\,\mathbb{P}_{s+t}(x,\mathrm{d}y)=S_{s+t}f(x)=S_s\left(S_tf\right)(x)=\int_E S_tf(y)\,\mathbb{P}_s(x,\mathrm{d}y)
\]
by the semigroup property and strong continuity. Thus,
\[
	S_{s+t}f(x)=\int_E \left(\int_E f(z)\,\mathbb{P}_t(y,\mathrm{d}z)\right)\,\mathbb{P}_s(x,\mathrm{d}y).
\]
Taking the specific case \(f\equiv\chi_A\) for some \(A\in B(E)\) we have
\[
	\mathbb{P}_{s+t}(x,F)=\int_E \left(\int_E \chi_A(z)\,\mathbb{P}_t(y,\mathrm{d}z)\right)\,\mathbb{P}_s(x,\mathrm{d}y)=\int_E \mathbb{P}_t(y,A)\,\mathbb{P}_s(x,\mathrm{d}y),
\]
concluding the proof."
        />
        <p>
          We should interpret {"\\(\\mathbb{P}_t(x,A)\\)"} as the probability of
          a jump from {"\\(x\\in E\\)"} to somewhere in {"\\(A\\subseteq E\\)"}{" "}
          at time {"\\(t\\)"}, given our process. Then,{" "}
          {"\\(\\mathbb{P}_t(x,\\mathrm{d}y)\\)"} is a jump to an infinitesimal
          distance about {"\\(y\\)"}. For this reason, the measures{" "}
          {"\\(\\mathbb{P}_t(x,\\cdot)\\)"} are commonly referred to as{" "}
          <strong>transition probabilities</strong> (or sometimes,{" "}
          <strong>Markov kernels</strong>). The Chapman-Kolmogorov equation then
          has the operational interpretation that a jump from {"\\(x\\)"} to
          within {"\\(A\\)"} at the time {"\\(s+t\\)"} is the same as looking at
          all possible infinitesimal jumps {"\\(x\\)"} to {"\\(y\\)"} at time{" "}
          {"\\(s\\)"}, and then a jump from {"\\(y\\)"} to within {"\\(A\\)"} at
          time {"\\(t\\)"}.
        </p>
        <p>
          Now, take a family {"\\((S_t)_{t\\geq 0}\\)"} of continuous linear
          maps on a Banach space {"\\(X\\)"}, with{" "}
          {"\\(S_0\\equiv \\mathbb{1}\\)"}. Suppose they form a{" "}
          <strong>strongly continuous semigroup</strong>, meaning{" "}
          {"\\(S_{s+t}=S_sS_t\\)"} and {"\\(t\\mapsto S_tx\\)"} is a continuous
          map {"\\([0,\\infty)\\to X\\)"} for any {"\\(x\\in X\\)"}. If we also
          have {"\\(\\|S_t\\|\\leq 1\\)"} for all {"\\(t\\geq 0\\)"}, then we
          call the family a <strong>contraction semigroup</strong>.
        </p>
        <p>
          Note that Feller semigroups were an instance of a contraction
          semigroup. Say {"\\(f\\in\\mathscr{C}_0(E)\\)"} is such that{" "}
          {"\\(\\|f\\|_{\\infty}\\leq 1\\)"} (the optimization region relevant
          for the operator norm). This means {"\\(-1\\leq f(x)\\leq 1\\)"} for
          all {"\\(x\\in E\\)"}, or that {"\\(-1\\leq f\\leq 1\\)"} (as constant
          functions). Since these inequalities are preserved (in particular,
          under any Markov semigroup), it must be that{" "}
          {"\\(-1 \\leq S_tf\\leq 1\\)"} and so{" "}
          {"\\(\\|S_tf\\|_{\\infty}\\leq 1\\)"}.
        </p>
        <p>
          Let {"\\((S_t)_{t\\geq 0}\\)"} be a strongly continuous semigroup. We
          associate to it its <strong>infinitesimal generator</strong>, the
          operator {"\\(A_S\\)"} by
          {`\\[
	A_Sx=\\lim_{t\\to 0^+} \\frac{S_tx-x}{t}.
\\]`}
          Note that {"\\(A_S\\)"} in general is unbounded. In fact, it need not
          even be densely-defined! However, being a contraction semigroup is
          enough to guarantee not only density, but closure, which tames the
          discontinuity considerably.
        </p>
        <Theorem
          statement="Suppose \(x\in\operatorname{dom}A_S\). Then, the derivative \(\mathrm{d}(S_tx)/\mathrm{d}t\) is well-defined. Moreover,
\[
	\frac{\mathrm{d}}{\mathrm{d}t}S_tx=S_tA_Sx=A_SS_tx.
\]"
        />
        <Proof
          proof="Let \(h>0\). By the semigroup property, we have
\[
	\frac{S_{t+h}x-S_tx}{h}=S_t\left(\frac{S_hx-x}{h}\right).
\]
Taking the limit as \(h\to 0+\), we see the right-most term is merely \(S_tA_Sx\). Approaching from the right, we similarly start by finding, 
\[
	\frac{S_{t-h}x-S_tx}{h}=S_{t-h}\left(\frac{x-S_hx}{h}\right).
\]
Witness now that
\[
	\left\|S_{t-h}\left(\frac{x-S_hx}{h}\right)-S_tA_Sx\right\|\leq \left\|S_{t-h}\left(\frac{x-S_hx}{h}-A_Sx\right)\right\|+\left\|S_{t-h}A_Sx-S_tA_Sx\right\|.
\]
By strong continuity, the right-most term vanishes as \(h\to 0\). Recalling then that \(\|S_{t-h}\|_{\textrm{op}}\leq 1\), we use the standard operator norm inequality to get
\[
	\left\|S_{t-h}\left(\frac{x-S_hx}{h}-A_Sx\right)\right\|\leq \left\|\frac{x-S_hx}{h}-A_Sx\right\|,
\]
which also clearly vanishes in the limit. In all, we have concluded that
\[
	\frac{\mathrm{d}}{\mathrm{d}t}S_tx=S_tA_Sx.
\]

To get the last equality, it stands merely to notice that
\[
	\frac{S_h(S_tx)-S_tx}{h}=S_t\left(\frac{S_hx-x}{h}\right).
\]
Taking \(h\to 0\), we get that \(A_SS_tx=S_tA_Sx\)."
        />
        <p>
          Before we return to the density and closure of {"\\(A_S\\)"}, let us
          take a brief moment to recognize what the above lemma is implying,
          morally. It seems to be saying that, at least formally, we may
          interpret the infinitesimal generator as being an (the?) operator
          which helps (and, perhaps, even characterizes!) the semigroup{" "}
          {"\\((S_t)\\)"} to satisfy the above differential equation. If{" "}
          {"\\(A_S\\)"} were bounded, then we could trivially solve this
          differential equation and find that {"\\(S_t\\equiv \\exp(t A_S)\\)"}.
        </p>
        <Theorem statement="The infinitesimal generator \(A_S\) is densely-defined and closed." />
        <Proof
          proof="Let \(x\in X\), and for \(\varepsilon>0\) define
\[
	x_{\varepsilon}=\frac{1}{\varepsilon}\int_0^{\varepsilon} S_tx\,\mathrm{d}t.
\]
Note this integral is a Bochner integral, and is well-defined due to strong continuity. As \(\varepsilon\to 0\), Lebesgue differentiation implies \(x_\varepsilon\to x\). To show density, it then suffices to show that \(x_\varepsilon\in\operatorname{dom}A_S\) for all \(\varepsilon>0\). To this end, let \(t>0\) and witness
\[
	\frac{1}{t}(S_tx_\varepsilon-x_\varepsilon)=\frac{1}{t\varepsilon}\left( S_t\int_0^{\varepsilon} S_h x\,\mathrm{d}h-\int_0^{\varepsilon} S_hx\,\mathrm{d}h\right).
\]
Linear operators respect integration, and so
\[
	S_t\int_0^{\varepsilon} S_h x\,\mathrm{d}h=\int_t^{\varepsilon+t} S_hx\,\mathrm{d}h
\]
due to the semigroup property. In all, taking the limit \(t\to 0\) we see
\[
	\frac{1}{t}(S_tx_\varepsilon-x_\varepsilon)\to \frac{1}{\varepsilon}(S_{\varepsilon}x-x)=A_Sx_\varepsilon,
\]
once more due to Lebesgue differentiation.

Moving onto to closure, take some sequence \(x_n\in \operatorname{dom}A_S\) such that \(x_n\to x\) and \(A_Sx_n\to y\). For closure, we require that \(x\in\operatorname{dom}A_S\) and that 
\[
	y=A_Sx=\lim_{t\to 0^+}\frac{S_tx-x}{t}.
\]
Now, let \(t>0\). By definition of the derivative and the fact that \(S_0\equiv\mathbb{1}\), we have
\[
	S_tx_n-x_n=\int_0^t  \frac{\mathrm{d}}{\mathrm{d}h}S_hx_n\,\mathrm{d}h=\int_0^t S_hA_Sx_n\,\mathrm{d}h,
\]
with the last equality following from what we just proved. We see that
\[
	\left\|\int_0^t S_hA_Sx_n\,\mathrm{d}h-\int_0^t S_hy\,\mathrm{d}h\right\|\leq \int_0^t \|S_h(A_Sx_n-y)\|\,\mathrm{d}h \leq t\|A_Sx_n-y\|,
\]
due to standard inequalities of the Bochner integral and the operator norm once more. Therefore, taking \(n\to \infty\) tells us
\[
	S_tx-x = \int_0^t S_hy\,\mathrm{d}h.
\]
Appealing again to Lebesgue, we divide through by \(t\) and in the limit get that
\[
	\frac{S_tx-x}{t}=\frac{1}{t} \int_0^t S_hy\,\mathrm{d}h\to y.
\]
Therefore, we have closure."
        />
        <p>
          We have now defined the infinitesimal generator, and showed that
          although it is unbounded, it is about as nice of an unbounded operator
          as we could ask for, less conditions on its adjoint. The lingering
          question remains, however: what to make of that differential equation?{" "}
        </p>
        <p>
          This will be the last step in our abstraction. It does indeed turn out
          that each operator with the properties of an infinitesimal generator
          uniquely determines some contraction semigroup! The definition of an
          exponential via a series is insufficient for even the nicest unbounded
          operators (one need only look at the derivative operator, for
          instance). However, the limit definition
          {`\\[
	\\exp(t A_S)=\\lim_{n\\to\\infty} \\left(\\mathbb{1}-\\frac{tA_S}{n}\\right)^{-n}
\\]`}
          is much more well-behaved, provided the resolvent set of {"\\(A_S\\)"}{" "}
          is sufficiently constrained, which is indeed the case. To show it, we
          first define the <strong>right half-plane</strong>
          {`\\[
	\\mathcal{R}=\\{\\lambda\\in\\mathbb{C} : \\mathfrak{R}\\lambda>0\\}.
\\]`}
        </p>
        <Theorem
          statement="We have the inclusion \(\mathcal{R}\subseteq \rho(A_S)\). Moreover, for all \(\lambda\in\mathcal{H}\) we have
\[
	R_\lambda x=\int_0^{\infty} \exp(-\lambda t)S_tx\,\mathrm{d}t
\]
and \(\|R_{\lambda}\|\leq 1/\mathfrak{R}\lambda\). Here, \(R_{\lambda}=R_{\lambda}(A_S)\). "
        />
        <Proof
          proof="Fix some \(\lambda\in\mathcal{R}\). We let \(t\geq 0\) and begin with the estimate
\[
	\|\exp(-t\lambda)S_tx\|\leq \exp\left(-t\mathfrak{R}\lambda\right)\|x\|.
\]
In particular, this tells us \(\exp(-t\lambda)S_tx\) is absolutely-integrable over \([0,\infty)\) as a function of \(t\). From here it also follows that
\[
	\|R_{\lambda}x\|\leq \int_0^{\infty} \|x\|\exp\left(-t\mathfrak{R}\lambda\right)\,\mathrm{d}t=\frac{\|x\|}{\mathfrak{R}\lambda},
\]
giving the desired estimate for the operator norm.

It remains to show this integral operator indeed yields the resolvent. That is, defining \(R_{\lambda}\) as this integral operator, we must show that \(R_{\lambda}\equiv (\lambda\mathbb{1}-A_S)^{-1}\). To this end, we begin by showing \(R_{\lambda}x\in\operatorname{dom}A_S\). Let \(x\in X\) be arbitrary, and witness that for \(t>0\) we have
\[
	\frac{1}{t}(S_tR_{\lambda}x-R_{\lambda}x)=\frac{1}{t}\left(\int_{t}^{\infty}\exp(-\lambda(h-t))S_hx\,\mathrm{d}h-\int_0^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h\right)
\]
through the semigroup property. We write this first integral as
\[
	\frac{1}{t}\int_{t}^{\infty}\exp(-\lambda(h-t))S_hx\,\mathrm{d}h=\frac{\exp(\lambda t)-1}{t}\int_{t}^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h+\frac{1}{t}\int_{t}^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h,
\]
where we add and subtract \(1\). In all, we are left to evaluate
\[
\frac{\exp(\lambda t)-1}{t}\int_{t}^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h+\frac{1}{t}\left(\int_{t}^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h-\int_0^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h\right).
\]
Taking the limit, we see the first term approaches
\[
	\lambda\int_0^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h=\lambda R_{\lambda}x,
\]
while for the last term we have
\[
	\frac{1}{t}\left(\int_{t}^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h-\int_0^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h\right)=\frac{-1}{t}\int_0^t\exp(-\lambda h)S_hx\,\mathrm{d}h\to -x.
\]
In all, we have \(A_SR_{\lambda}x=\lambda R_{\lambda}x-x\). This simultaneously shows that \(R_{\lambda} x \in \operatorname{dom}A_S\) and that \((\lambda\mathbb{1}-A_S)R_{\lambda}x=x\). To get the other inverse, we notice that
\[
	R_{\lambda}(\lambda\mathbb{1}-A_S)x=\lambda\int_0^{\infty}\exp(-\lambda h)S_hx\,\mathrm{d}h-\int_0^{\infty} \exp(-\lambda h) S_hA_Sx\,\mathrm{d}h.
\]
We recognize that
\[
\begin{aligned}
	\int_0^{\infty} \exp(-\lambda h) S_hA_Sx\,\mathrm{d}h &=\int_0^{\infty} \exp(-\lambda h) \frac{\mathrm{d}}{\mathrm{d}h}S_h x\,\mathrm{d}h\\
	&=\exp(-\lambda h)S_hx\Big|_{h=0}^\infty +\lambda\int_0^{\infty} \exp(-\lambda h)S_hx\,\mathrm{d}h\\
	&=-x+\lambda R_{\lambda} x
\end{aligned}
\]
via integration by parts, whence the conclusion follows."
        />
        <p>
          This lets us take our final leap into the abstract, finding that these
          properties of our constructed infinitesimal generator are sufficient
          for us to take them as a starting point, and conversely construct the
          semigroup. This is the content of the following theorem.
        </p>
        <Theorem
          statement="Let \(A\) be a densely-defined closed operator such that \(\mathcal{R}\subseteq\rho(A)\) and
\[
	\|R_{\lambda}(A)\|\leq\frac{1}{\mathfrak{R}\lambda}
\]
Define \(A_{n}=n A R_{n}(A)\) for \(n>0\) and for \(t\geq 0\) define
\[
	e^{tA}=\lim_{n\to\infty} \exp\left(t A_{n}\right).
\]
Then, \((e^{tA})_{t \geq 0}\) is a contraction semigroup, uniquely with \(A\) as its infinitesimal generator."
          name="Hille-Yosida"
        />
        <p>
          Some remarks are in order. First, the notation {"\\(\\exp(t A)\\)"} is
          purely formal, since again we cannot define the exponential of{" "}
          {"\\(A\\)"} in general. However, {"\\(A_n\\to A\\)"} (as operators
          restricted to {"\\(\\operatorname{dom} A\\)"}), and the exponential of{" "}
          {"\\(A_n\\)"} is perfectly well-defined (indeed, {"\\(A_n\\)"} is
          always bounded). In the event that {"\\(A\\)"} is bounded, then of
          course its exponential is no longer formal, and actually is a true
          equality. Lastly, we will now, in general, write {"\\(e^{tA}=S_t\\)"}{" "}
          and {"\\(A=A_S\\)"} for contraction semigroups in general, implicitly
          inserting the above remarks.
        </p>
        <p>
          There is yet still a subtle loose end, that of Feller semigroups.
          These are instances of contraction semigroups, yes, and so
          Hille-Yosida does characterize – almost entirely – their infinitesimal
          generators. However, Feller semigroups must preserve positivity as
          well. In fact, all Markov semigroups must! We need an extra condition
          on our infinitesimal generator to ensure this happens - that being
          that a (perhaps unbounded!) operator {"\\(A\\)"} is dissipative if
          {`\\[
	\\|\\lambda x-Ax\\|\\geq \\lambda\\|x\\|
\\]`}
          for all {"\\(\\lambda>0\\)"} and {"\\(x\\in\\operatorname{dom}A\\)"}.
        </p>
        <Theorem statement="Let \((e^{tA})_{t\geq 0}\) be a Feller semigroup on \(\mathscr{C}_0(E)\). Then, \(A\) is dissipative." />
        <Proof
          proof="Let \(f\in\mathscr{C}_0(E)\). Due to its vanishing property we know it attains its maximum at some \(x_0\in E\). Without loss of generality assume \(f(x_0)\geq 0\). We know that
\[
	Af(x_0)=\lim_{t\to 0^+}\frac{e^{tA}f(x_0)-f(x_0)}{t}.
\]
Now, let \(f^{+}=\max\{f,0\}\). Since Feller semigroups preserve positivity and are contractions, for all \(t\geq 0\) we have
\[
	S_tf\leq S_tf^{+} \leq \|S_tf^{+}\|\leq \|f^{+}\|=f(x_0).
\]
Thus,
\[
	Af(x_0)\leq \lim_{t\to 0^+} \frac{f(x_0)-f(x_0)}{t}=0.
\]
Now, let \(\lambda>0\), and by definition of the infinity-norm we have
\[
	\|\lambda f-Af\|\geq \|\lambda x_0-Af(x_0)|=\lambda f(x_0)-Af(x_0)\geq \lambda f(x_0)=\lambda\|f\|,
\]
since \(Af(x_0)\leq 0\leq \lambda f(x_0)\)."
        />
        <p>
          So, Feller semigroups have dissipative infinitesimal generators. The
          structure of our narrative has hopefully foreshadowed that the reverse
          holds as well.
        </p>
        <Theorem
          statement="Let \(A\) be a dissipative operator such that \(\lambda\mathbb{1}-A\) is surjective for some \(\lambda>0\). Then, \(A\) is the infinitesimal generator of a contraction semigroup."
          name="Lumer-Phillips"
        />
        <p>
          This is, more or less, a restatement of Hille-Yosida. First, the
          dissipation of {"\\(A\\)"} gives the estimate necessary for the norm
          of the resolvent. Second, a non-empty resolvent set is enough to imply
          the closure of an operator (essentially just limit of the sequence on
          the resolvent). We this all in mind, we may now posit the following
          definition – that an operator satisfying the hypotheses of the
          Lumer-Phillips theorem is a Markov generator. In the specific case
          that {"\\(A\\)"} is an operator over {"\\(\\mathscr{C}_0(\\R)\\)"}, it
          will be the generator of a Feller process.
        </p>
        <p>
          Let us conclude with a simple example. We know what the infinitesimal
          generator of a Weiner process on {"\\(E=\\R\\)"} ought to be:
          {`\\[
	A=\\frac{1}{2}\\frac{\\mathrm{d}^2}{\\mathrm{d}x^2}.
\\]`}
          We consider this to be an unbounded operator on{" "}
          {"\\(\\mathscr{C}_0(\\R)\\)"}. Of course, for such an operator to make
          sense we must (at least!) restrict ourselves to twice
          continuously-differentiable functions. That is,
          {`\\[
	\\operatorname{dom}A=\\mathscr{C}_0(\\R)\\cap\\mathcal{C}^2(\\R).
\\]`}
          Fortunately, it is a standard result that this domain is dense.
        </p>
        <p>
          Getting dissipativity is not hard: we deduced earlier that this is
          implied if for all {"\\(f\\in\\operatorname{dom}A\\)"} which have a
          positive maximum, say at {"\\(x_0\\)"}, then {"\\(Af(x_0)\\leq 0\\)"}.
          However, we know this already due to simple calculus! Indeed, if{" "}
          {"\\(f\\)"} has a local maximum at {"\\(x_0\\)"}, then its second
          derivative at {"\\(x_0\\)"} is always negative.
        </p>
        <p>
          The difficult part is showing it holds a non-empty resolvent set. For
          this, let {"\\(\\lambda>0\\)"} and take some{" "}
          {"\\(g\\in\\mathscr{C}_0(\\R)\\)"}. What this amounts to is finding
          some unique {"\\(f\\in\\operatorname{dom}A\\)"} such that
          {`\\[
	g=\\lambda f-Af=\\lambda f-\\frac{1}{2} f''.
\\]`}
          The existence and uniqueness of such an {"\\(f\\)"} proves that the
          resolvent operator is invertible. Of course, we are free to pick{" "}
          {"\\(\\lambda\\)"} (or rather, fix any positive one), but {"\\(g\\)"}{" "}
          must be a generic function which vanishes at infinity.{" "}
        </p>
        <p>
          It is possible to show that this differential equation is generically
          solvable, with solutions taking the form
          {`\\[
	f(x)=c_1\\exp\\left(-\\sqrt{2\\lambda}x\\right)+c_2\\exp\\left(\\sqrt{2\\lambda}x\\right)+\\frac{2}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\sinh\\left(\\sqrt{2\\lambda}(x-y)\\right)\\,\\mathrm{d}y,
\\]`}
          with constants {"\\(c_1\\)"} and {"\\(c_2\\)"}. Rewriting the
          hyperbolic sine, we find this is equal to
          {`\\[f(x)=\\left[c_1+\\frac{1}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(-\\sqrt{2\\lambda}y\\right)\\right]\\exp\\left(\\sqrt{2\\lambda}x\\right)
	+\\left[c_2-\\frac{1}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(\\sqrt{2\\lambda}y\\right)\\,\\mathrm{d}y\\right]\\exp\\left(-\\sqrt{2\\lambda}x\\right).\\]`}
          Suppose now that {"\\(f\\in\\mathscr{C}_0(\\R)\\)"}. Indeed, as{" "}
          {"\\(x\\to\\infty\\)"} we have
          {`\\[
\\frac{\\exp\\left(-\\sqrt{2\\lambda}x\\right)}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(\\sqrt{2\\lambda}y\\right)\\,\\mathrm{d}y=\\frac{g(x)}{\\sqrt{2\\lambda}}\\to 0
\\]`}
          since {"\\(g\\in\\mathscr{C}_0(\\R)\\)"} by assumption. The{" "}
          {"\\(c_2\\)"} vanishes of course as well. Moving to the remaining
          term, we examine the parenthetical component:
          {`\\[
	\\left|\\,c_1+\\frac{1}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(-\\sqrt{2\\lambda}y\\right)\\right|\\leq |c_1| + \\|g\\|\\int_0^x \\exp\\left(-\\sqrt{2\\lambda}y\\right)\\,\\mathrm{d}y.
\\]`}
          Taking {"\\(x\\to\\infty\\)"}, this integral converges. So, as{" "}
          {"\\(f\\)"} vanishes at infinity, we must have that
          {`\\[
 	c_1=-\\frac{1}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(-\\sqrt{2\\lambda}y\\right)
\\]`}
          in order for the remaining term to disappear. Likewise, taking{" "}
          {"\\(x\\to -\\infty\\)"} will show that
          {`\\[
 	c_2=\\frac{1}{\\sqrt{2\\lambda}}\\int_0^x g(y)\\exp\\left(\\sqrt{2\\lambda}y\\right)\\,\\mathrm{d}y.
\\]`}
          Therefore, such an {"\\(f\\)"} is unique. This shows that {"\\(A\\)"}{" "}
          indeed satisfies the hypotheses necessary to be a Markov generator. As
          a consequence, if we prove that {"\\(A\\)"} does generate a Weiner
          process, we immediately know too that the Weiner process is Feller.
        </p>
      </div>
    );
  }
}

export default Markov;
