The Analysisters

Monday, March 16, 2015

Distributional Calculus Part 4: Properties of Distributions

So, in the previous post, we found that distributions gave an alternate way to characterize functions; that is, by mapping from a set of test functions instead of from a compact set $X$. Test functions turn out to be completely central to how operations are performed! In fact, I'll spoil the entire content of this post by saying any operation on $f$ can be 'moved' to apply on the set of test functions instead.

But before that, let's list some basic properties which are more evocative of elementary real analysis than anything else. For a distribution $\langle f, \phi\rangle$:

Linearity, i.e. $f(a\phi_1+\phi_2)$ for any real constant $a$ and test functions $\phi_1,\ \phi_2$;
There exists a sequence of test functions $\{\phi_n\}$ such that $\phi_n \to f$

All of these properties are necessary, but we'll be making the most use out of the second one. Recall the super-useful integral characterization of a distribution $$T_f(\phi)=\int_{\mathbb{R}}f(x)\phi(x)\,dx?$$ That can only be expressed if $f$ is a function with no weird generalized properties. Yet now, if we consider $f$ as the limit of a sequence of test functions, $\phi_n$ is a classically defined function for all $n$ and it is now possible to write $$T_f(\phi)=\lim_{n\to\infty}\int_{\mathbb{R}}\phi_n(x)\phi(x)\;dx$$ for any generalized function $f$.* Great! Now we can look at any and all distributions the easy way.

The real magic starts when we attempt to translate the distribution. Recall that any function can be translated $y$ units by taking $f(x-y)$ instead of $f(x)$; the same thing can be done for generalized functions by considering $\lim_{n\to\infty}\langle \phi_n(x-y),\phi(x)\rangle$. (Let's define the translation function tau as $\tau_y\phi(x)=\phi(x-y)$.) Using some simple $u$-substitution magic, \begin{align}\langle\tau_yT_f,\phi\rangle &=\lim_{n\to\infty}\langle \phi_n(x-y),\phi(x)\rangle\\&=\lim_{n\to\infty}\int_{\mathbb{R}}\phi_n(x-y)\phi(x)\;dx;\qquad u = x-y\\&=\lim_{n\to\infty}\int_{\mathbb{R}}\phi_n(u)\phi(u+y)\;du\\&=\langle T_f,\tau_{-y}\phi\rangle.\end{align} We have essentially found that any distribution can be translated by applying the opposite translation to every test function in $\mathcal{D}$. To reiterate:$$\langle\tau_yT_f,\phi\rangle =\langle T_f,\tau_{-y}\phi\rangle.$$ Hooray!

Differentiating a distribution works in much the same way as translation in that the operation gets pawned off onto the test function but with an extra minus sign. However, it does involve an extra technique: integration by parts. I assume that nobody who is reading this is unfamiliar with the practice, but, for the sake of cute mnemonics, a friend of my fiancé's refers to $$\int u\;dv = uv - \int v \;du$$as "sudv uv svidoo."

Let's take a moment to appreciate how adorable that is.

The actual fancy differentiation trick can be proved in essentially one integration-by-parts step:\begin{align}\left\langle \frac{d}{dx}T_f,\phi\right\rangle &= \lim_{n\to\infty}\int_\mathbb{R}\left(\frac{d}{dx}\phi_n(x)\right)\phi(x)\;dx\\&= \lim_{n\to\infty}-\int_\mathbb{R}\phi_n(x)\left(\frac{d}{dx}\phi(x)\right)\;dx \\ &=\left\langle T_f, -\frac{d}{dx}\phi(x)\right\rangle\end{align}(brownie points if you've already figured out what happened to the $uv$ term). This identity is essential for a crazy number of distributional calculus proofs.

For example, we can directly use this identity to prove the Dirac delta function is the distributional derivative of the Heaviside function in two seconds. Let $T_H= \int_0^\infty \phi(x)\,dx$ represent the Heaviside distribution. Now, from the above identity, we conclude $$\left\langle \frac{d}{dx}T_H(x),\phi\right\rangle=\left\langle T_H(x),\frac{d}{dx}\phi\right\rangle=-\int_0^\infty \phi(x)\,dx=\phi(0)-\phi(\infty)=\phi(0),$$ that is, because $\phi$ is zero at infinity. Yet $\phi(0)=\langle \delta, \phi\rangle$ by definition! We're done here.

As super awesome as that is, there should be some material on how all this pertains to weak solutions of DEs up on Thursday. Woooo! This is basically my definition of a party!

* The MCT happened here. Shhh.

Wednesday, March 11, 2015

Distributional Calculus Part 3: Distributions

Sorry for the delay, guys! I just started a rather demanding full-time job, so it may be a bit hard to keep up the quality of these posts. Let's hope it gets easier...

Today brings us to the most important definition in distributional calculus: the distributions themselves.

Here's the formal definition using the set of test functions $\mathcal{D}(\mathbb{R})$ we defined earlier:

Any linear functional $T: \mathcal{D}(\mathbb{R}) \to \mathbb{R}$ a distribution. In addition, for a locally integrable function $f(x):X\to\mathbb{R}$, a corresponding distribution can be defined by $$T_f(\phi)=\int_{\mathbb{R}}f(x)\phi(x)\;dx.$$We usually write $\langle T, \phi\rangle$ instead of $T(\phi)$ and call the set of all distributions of this type $\mathcal{D}'(\mathbb{R})$.

There are only two things needed to truly understand this definition; how to take the average of a continuous function and what test functions are. Check out the integrand. Multiplying the target function $f(x)$ by each individual test function $\phi(x)$ has the effect of scaling $f(x)$ at every point---in particular, the integrand zeros out outside the support of $\phi(x)$, while the other points are weighted depending on $\phi(x)$. Hence every individual component of the definition a weighted average of $f(x)$ over a compact set. (Strichartz directly compares this to finding the temperature of a room with a thermometer: it won't display the temperature at one point, rather the average temperature of some portion of the area.) If each of these weighted averages are known for every existing $\phi(x)$, that is what defines the distribution.

Defining distributions in this way lets us account for objects that we think look like functions, but actually aren't. The Dirac delta function is the perfect example---the infinite value at zero ruins anything, so it isn't really a function*. However, the integral of $\delta(x)$ is bounded no matter what test function we weight it by, so the 'average' exists over every possible range, meaning $\delta(x)$ is a distribution. In particular, $$\langle \delta,\phi\rangle = \phi(0).$$

It would be useful to go over a couple useful properties of distributions, starting with the issue of consistency. This was supposed to happen today! Unfortunately, I'm dead tired and need to go lie down forever. Let's leave the important properties for next week.

* The Dirac delta function is to functions what killer whales are to whales... a complete misnomer.

Wednesday, March 4, 2015

Distributional Calculus Part 2: Compact support and test functions

Our goal with this series is to provide a resource for basic distribution theory that includes all of the formal definitions, justifications and theorems with as little hand-waving as possible, while also fully explaining these definitions through appeals to intuition.The following is written assuming an audience who cares or wants to care about mathematical formality but needs some intuitive background in order to learn quickly.

A few minor definitions are needed to understand what distributions represent. We define a set $X$ and function $\phi: X\to\mathbb{R}$ for the rest of this post.

The first two definitions are very simple.

Definition: Suppose $\phi$ is in $L_p(\mathbb{R}^n)$ and $X$ is open. We say $\phi$ is locally integrable if, for all compact subsets $A$ of $X$,
$$\int_A |\phi(x)|\;dx< \infty.$$The space of all such functions is called $L_p^{loc}$.

The formal definition of compactness can be found here. For those who haven't studied real analysis, a subset of $\mathbb{R}^n$ is compact if and only if it is closed and bounded.

Definition: The support of $\phi$, written supp($\phi$), is the closure of the set of points in X where f is non-zero. That is,

$$\operatorname{supp}(\phi) = \{x\in X \,|\, \phi(x)\ne 0\}.$$(Topologists use a slightly different definition.)

From here, a slightly more specific property can be considered:

Definition: A function $\phi$ is said to have compact support if $supp(\phi)$ is compact.

It's hard to come up with a compactly supported function without specifying that the complement of the support is zero. As a result, most easily representable test functions, even the continuous and infinitely differentiable ones, are defined piecewise. We consider a few examples.

Note that compact support can also be interpreted as the function vanishing outside a compact set; continuous functions are always nonzero on an open set, so taking the closure in the definition of support is necessary.

One of the simplest examples of a compactly supported function is $\chi_A(x)$, where $A$ is a compact set and
$$\chi_A(x)=\left\{\begin{array}{ll}1&x\in A\\0& x \notin A.\end{array}\right.$$This is the identity on $A$ and zeros out everything else. In fact, the composition of $\chi_A(x)$ with any function on $x$ will have compact support as well. Here are a couple examples:

(Test yourself! Is H(x) from the previous post compactly supported? Are B-splines?)

This example leads well into the last definition.

Definition: A function $\phi$ is a test function if it has compact support and is infinitely differentiable (i.e., in $C^\infty$). We refer to the space of all test functions on a set $X$ as $\mathcal{D}(X)$.

This is a crucial definition! It's weird for a function to have compact support but to also be infinitely differentiable, so let's generate a couple examples. Consider
$$\psi(x)=\left\{\begin{array}{ll}e^{-\frac{1}{1-x^2}}&|x|<1\\0& |x|\geq 1.\end{array}\right.$$This is a lot smoother than the previous function, and looks like a bump:

A slightly more complicated example would be
$$u_A(x)=\int_{\mathbb{R}^n}\chi_A(x-y)u(y)\;dy$$where $u(x)$ is a locally integrable function in $\mathbb{R}^n$ (the technique used to generate this example is called Sobolev's mollification method). If you're familiar with convolution already, it should not be difficult to prove this function is compactly supported. It looks like someone built a sandcastle shaped like a regular $\chi$ function and a wave rolled over it:

Many operations, such as translation and scaling, preserve infinite differentiability and compact support. Linear combinations of test functions and products of test functions are also test functions themselves.

(Test yourself! Can test functions be analytic?)

So, our point---these definitions are necessary in order to understand what distributions are. We'll go into this in detail next week.

Monday, March 2, 2015

Distributional Calculus Pt. 1: What is it?

In high school, despite being told I was "good at math" for being able to perform simple algebra, I was terrified of calculus. It was a scary word---"calculus"---and I didn't want to be outed as an impostor who wasn't ever good at math at all. That's how I ended up enrolled in the easiest calculus course offered at my high school, a place where most people took AP Calc. That's also how I ended up bored with the slow pace and lack of formality of my first calculus course, and transferred to AP Calc halfway through the year. That's also when I developed the unmitigated desire to become a mathematician; the calculus floodgates had been opened, and the only cure was more calculus. Calculus was followed by real analysis. Real analysis was followed by functional analysis.

Which brings us here... to the ultimate form of calculus. But why? Why does such a thing exist?

The catalyst for developing a more general form of calculus came when some people, such as physicists and engineers, decided it was okay to consider derivatives of non-differentiable functions. We consider the Heaviside step function ($H(x)$) as the quintessential example: this function is constant and hence has a zero derivative everywhere except at the jump discontinuity, where the classical definition of the derivative breaks down. One could reason that, because the derivative at a point is the slope of the tangent line, and the tangent line at the jump is a vertical line with infinite slope, $H'(0)$ is infinity. We therefore understand the derivative of the Heaviside function to be zero everywhere except at the jump, where it's infinite. That's the Dirac delta function ($\delta(x)$)!

Generally---and I apologize for stereotyping here---generally, physicists and engineers are totally okay with this interpretation and accept it as fact, but mathematicians are upset by the hand-waving. It particularly bothered Sergei Sobolev and Laurent Schwartz, whose work lead to the first mathematical justification of these ideas. This formalization of the engineers' and physicists' approaches grew to be called distributional calculus.

Distributions (also called generalized functions) define a broad set of function-like objects including, but not limited to, classical functions (hence, generalized functions). Distributional calculus is the study of calculus on this larger class of objects. This certainly allows for a formal reimagining of the Heaviside example given above: the Heaviside function is nondifferentiable at a point, but its distribution is differentiable everywhere! It can also be used to describe "weak" solutions of DEs. So, if you're like me and can't get enough calculus, it's just... more. More calculus.

Distributional calculus is also a great demonstration of the central public-relations conflict of real/functional/complex analysis: it's both the coolest thing anyone has done, ever, but also completely inaccessible to laypeople. In particular, the notation gets very intimidating, very fast. (Converting any idea from functions to distributions requires several million extra symbols.)

Our goal with this series is to provide a resource for basic distribution theory that includes all of the formal definitions, justifications and theorems with as little hand-waving as possible, while also fully explaining these definitions through appeals to intuition. There are already great books that deal with the formal side of distribution theory (Haroske and Triebel, 2008; Friedlander and Joshi, 1998) and great books that eschew formality in order to be accessible to physicists and engineers (Strichartz, 2003). These books are much better than a series of blog posts---that's why the authors of the books get paid. However, we adopt a different approach for our audience: the first set of textbooks caters to analysts, the second to people who don't care for analysis, while we assume the audience cares or wants to care about mathematical formality but needs some intuitive background in order to learn quickly.

Without further exposition, here's the game plan for March:

Week 1 & 2: Basic definitions (compact support, test functions, distributions, distributional derivatives, all that good stuff)
Week 3: The big examples
Week 4: A couple important theorems
Week 5 (March 31st): Recent papers /books for suggested further reading

Lastly, especially if you're a non-mathematician who doesn't care about overt formality, I cannot recommend the Strichartz enough. It's hilarious! I definitely got something out of it despite being peeved at the lack of formal analysis.

[1] Haroske, Dorothee, and Hans Triebel. Distributions, Sobolev spaces, elliptic equations. European Mathematical Society, 2008.
[2] Friedlander, Friedrich Gerard, and Mark Suresh Joshi. Introduction to the Theory of Distributions. Cambridge University Press, 1998.
[3] Strichartz, Robert S. A guide to distribution theory and Fourier transforms. Singapore: World Scientific, 2003.

Wednesday, February 25, 2015

The 6 Stages of Math Writing

Here's some news! I've decided to devote all of March to the basics of distributional calculus. In undergrad, I had a professor that taught distributional calculus from a purely theoretical standpoint and refused to match this with intuition, so this will be an adventure in explaining math for me as well.

It goes so well with the blog title---we're the Analyisisters! Let's throw some analysis at everyone!

In the meantime, sit tight while I pretend to be Seinfeld.

How would you write out the solution to this problem at each stage of university life?

Let $A$, $B$ be matrices in $\mathbb{R}^n$. If $AB = I$, then $A^kB^k=I$ for all $k \in \mathbb{N}$.

LEVEL 1: FROSH

\begin{align}
B^kA^k &= B\ldots BBAA\ldots A\\
&= B\ldots BIA \ldots A\\
&= B \ldots BA \ldots A\\
&= B \ldots BIA \ldots A = I
\end{align}

Yes, I know math homework is supposed to be written in complete sentences, but, why bother? I'm the chosen one who will be able to understand what this means 12 years later.

I mean, come on. I understand it right now. It's really easy.

LEVEL 2: SOPHOMORE

$BA = I$. Then $B^{k+1}A^{k+1} = B^kBAA^k=B^kIA^k=B^kA^k=I$.

Oh, you were serious about that sentence thing? And the sentences have to end with periods? Are you sure? Okay.

Hey, are you going to take points off if I don't put it in a sentence? Why are you doing that? I didn't know that was going to happen.

LEVEL 3: JUNIOR/LAZY GRAD STUDENT

We know that $BA=I$. Suppose $B^kA^k=I$. Therefore,
\begin{align}
B^{k+1}A^{k+1}&=B^kBAA^{k}\\
&=B^kIA^k.
\end{align}Therefore, using the properties of the identity, $B^{k+1}A^{k+1}=B^kA^k = I$. Therefore, this proves our statement.

They'll never guess my favorite connecting word.

LEVEL 4: SENIOR/GRAD STUDENT

This can be solved using induction. We are given that $BA = I$, providing the base case, so we suppose that $B^kA^k = I$ to show that $B^{k+1}A^{k+1}= I$. We then find that
\begin{align}
B^{k+1}A^{k+1}&= B^kBAA^k\\
&= B^kA^k = I,
\end{align}as desired.

Wow, can you believe how I wrote as a frosh? Who even thinks that's okay? I guess that it shows that I know how important math writing is. That.

LEVEL 5: PERFECTIONIST GRAD STUDENT

We proceed inductively with the given base case $BA=I$. Suppose $B^kA^k = I$ towards demonstrating $B^{k+1}A^{k+1}$ to be the identity as well. Using the definition of integer exponents and both given/inductive hypotheses, we conclude
$$B^{k+1}A^{k+1}=B^kBAA^k=B^kA^k=I;$$that is, the conditions of induction are satisfied and the original statement follows. This fact can be used to show equivalency of left and right inverses (i.e., $AB = I$ iff $BA = I$ for square $A,\ B$ of concordant dimensions).

Varying sentence structures, excessively clear logic, weird punctuation marks, parenthetical statements. Look! Revel in my competence! Feel the 2 hours I spent formatting the answer until it was textbook perfect!

Do you want to see my personalized LaTeX class with a multi-page macro set designed specifically for this field? I made it while my friends were at the bar.

LEVEL 6: PROFESSOR/VERY CONFIDENT GRAD STUDENT

Given $BA=I$ as the inductive hypothesis, observe that
$$B^{k+1}A^{k+1}=B^kBAA^k=B^kA^k=I$$implies the above.

There's no way I'm spending more than 5 minutes on this trivial problem. Why are you even showing it to me? I have several papers to review and two classes to prepare for. This it pointless.

Stay tuned next week, where we do absolutely nothing funny and go down the rabbit hole of formal math! (I need to update my macro set.)

Monday, February 23, 2015

Rolling Shutter + Moving Things = WICKED

There is a point in every blog's life where the audience and niche becomes set in stone, a point which this blog seems to be quite far from reaching. Do I go through the proof of Hölder's Inequality with informal language and cute pictures? Or, instead, simple mental math tricks that everyone alive should know? A smattering of recent interdisciplinary papers I have opinions on, or stories of working with high school and middle school tutees? Macros in $\LaTeX$? Householder reflectors? That time I found out biologists use "units" to refer to a different quantity for every substance?

So here we fall back on the old "what is Peter up to" shebang, which is never not funny. I feel truly blessed to have a partner who spends hours looking at fluid dynamics in bubble solution and can spell his initials in a 9x9 puzzle cube. The fields he finds interesting (look at all the things prime numbers can do! pretty pictures!) are also more accessible to laypeople than the fields I find interesting (okay, now memorize definitions for 2 years! in two more years you will be able to appreciate distributional calculus!). Maybe that's why there are so few famous analysts.

The biggest fight we ever had was over his finitism. He tried to convince me it was silly to model reality using irrational numbers that can't be described using a finite amount of information; I sat on the bed sobbing because the axiomatic structure he was proposing didn't have a clear measure, and so how do sets get mass, and HOW DOES INTEGRATION WORK IN YOUR CRAZY WORLD? DON'T YOU CARE ABOUT THEORETICAL JUSTIFICATION? HUH?!

Pictures, right? Everyone likes pictures?

Some background: this particular incident occurred when Peter discovered his cellphone camera took pictures by storing data from the top down, so that the photos were separated into horizontal lines that were actually taken at different times. (Wikipedia assures me this is called rolling shutter.) Usually, this doesn't make a difference---unless if one were to take pictures of something spinning or vibrating really fast.

So of course that's what he did for a whole week.

Here's what his mom's spinning flamingo looks like in real life:

... but with a rolling shutter, it's a curved monstrosity...

An ordinary fan looks like it has vertical blades:

Bouncing balls show deformity:

And, for our personal favorite, filming a cello gives a visualization of the old $u_tt = c^2\nabla^2u$:

All things considered, it wasn't a bad way to spend a week.

Got these? Share 'em!

Wednesday, February 18, 2015

Convexity and You: Unpacking the Definition

Real Analysis is notorious for taking easy-to-understand concepts and repackaging them in a thick theoretical barrier. Take the epsilon-delta definition of continuity---it's impossible to prove anything with the information "the function, uh, doesn't have any holes," but it's impossible to develop a mental picture given only the theoretical perspective. For this reason, one of the biggest barriers to learning any type of analysis is properly connecting the intuitive idea and the theoretical representation.

We'll focus here on one of the less transparent definitions: convex functions. Convex functions can be understood intuitively as "the area above the function is a shape that doesn't go inwards on itself"... and theoretically as

Given convex set $X$, a function $f:X\to\mathbb{R}$ is convex if for all $x_1,\ x_2\in X$ and $t \in [0,1]$, $f(tx_1+(1-t)x_2)\leq tf(x_1)+(1-t)f(x_2)$.

What.

This is the part where, during an analysis course, you are expected to nod your head at the alphabet vomit (at least this time it's the Roman alphabet, not the Greek, that tossed its cookies). Let's make some sense out of what information is being conveyed.

First of all, to understand the definition of convex functions, you must know what convex sets are. A set is convex if any two points (call them $x_1$ and $x_2$) can be connected by a straight line that is contained in the set. If the set is not convex (i.e. "goes inwards" visually), then there will be at least two points whose connecting line goes outside the set.

Now the domain of $f$ is a convex set $X$, which should explain what the points $x_1$ and $x_2$ are doing in the definition: they correspond to the two arbitrary points that we want to try and connect with a line. This brings us to the purpose of defining $t \in [0,1]$. Consider the function $y(t)=tx_1+(1-t)x_2$. Since $y(0)=x_2$, $y(1)=x_1$ and $y$ itself is a linear functional, this function represents a straight line segment starting at $x_2$ and ending at $x_1$. Thus the purpose of $t$ is to create the parametrized line segment joining points $x_1$ and $x_2$.

We are given that $X$ is a convex set, so it is certainly true that the line $tx_1+(1-t)x_2$ is completely contained in $X$, the domain of $f$. This makes it completely legit to consider $f(tx_1+(1-t)x_2)$ as the image of this line. The image of a straight line in the domain won't necessarily be a straight line itself, but will instead be a path along the function starting at $f(x_2)$ and ending at $f(x_1)$. Hence the expression $f(tx_1+(1-t)x_2)$ is asking us to consider the section of $f(x)$ that connects* $f(x_1)$ and $f(x_2)$.

This brings us to the last part of the inequality
$$f(tx_1+(1-t)x_2)\leq tf(x_1)+(1-t)f(x_2).$$
Just as before, the second expression $tf(x_1)+(1-t)f(x_2)$ is representing a parametrized line segment, joining the points $f(x_2)$ and $f(x_1)$. We are now comparing two paths between $f(x_1)$ and $f(x_2)$: one is a straight line, and the other a path on the function. The inequality places a lower bound on where the straight line can be. If the straight line is above the path on $f$ everywhere---that is, if it satisfies the above inequality---it is contained in the area above $f(x)$ (the epigraph of $f$).

That's exactly the definition of a convex set, but applied to the space above $f$... cool.

Here's a picture for $X = \mathbb{R}$:

That's what the definition is communicating. I hope that was insightful for someone!

*(does not refer to connectedness in the mathematical sense)