Differential Calculus

1.2: Differential Calculus #

1.2.1: “Ordinary” Derivatives #

Suppose we have a function of one variable, \( f(x) \). Question: what does the derivative \( \dv{f}{x} \) do for us? Answer: It tells us how rapidly the function \( f(x) \) varies when we change the argument x by a tiny amount, \( \dd{x} \)

\[\dd{f} = \left( \dv{f}{x} \right) \dd{x} \tagl{1.33}\]

In words: If we increment x by an infinitesimal amount \( \dd{x} \), then \( f \) changes by an amount \( \dd{f} \); the derivative is the proportionality factor. Foe example, in Fig. 1.17(a), the function varies slowly with x, and the derivative is correspondingly small. In Fig 1.17(b), f increases rapidly with x, and the derivative is large as you move away from \( x = 0 \). Geometrical interpretation: The derivative \( \dv{f}{x} \) is the slope of the graph of f versus x.

1.2.2: Gradient #

Suppose, now, that we have a function of three variables-say, the temperature T (x, y, z) in this room. (Start out in one comer, and set up a system of axes; then for each point (x, y, z) in the room, T gives the temperature at that spot.) We want to generalize the notion of “derivative” to functions like T, which depend not on one but on three variables.

A derivative is supposed to tell us how fast the function varies, if we move a little distance. But this time the situation is more complicated, because it depends on what direction we move: If we go straight up, then the temperature will prob- ably increase fairly rapidly, but if we move horizontally, it may not change much at all. In fact, the question “How fast does T vary?” has an infinite number of answers, one for each direction we might choose to explore.

Fortunately, the problem is not as bad as it looks. A theorem on partial derivatives states that

\[\dd{T} = \left( \pdv{T}{x} \right)\dd{x} + \left( \pdv{T}{y} \right) \dd{y} + \left( \pdv{T}{z} \right) \dd{z} \tagl{1.34}\]

Figure 1.17

This tells us how T changes when we alter all three variables by the infinitesimal amounts dx, dy, dz. Notice that we do not require an infinite number of derivatives - three will suffice: the partial derivatives along each of the three coordinate directions. \( \eqref{1.34} \) is reminiscent of a dot product:

\[\dd{T} = \left( \pdv{T}{x} \vu{x} + \pdv{T}{y} \vu{y} + \pdv{T}{z} \vu{z} \right)\cdot ( \dd{x} \vu{x} + \dd{y} \vu{y} + \dd{z} \vu{z} \\ = (\grad{T}) \cdot (\dd{\vec{l}}) \tagl{1.35}\]

where

\[\grad{T} \equiv \pdv{T}{x} \vu{x} + \pdv{T}{y} \vu{y} + \pdv{T}{z} \vu{z} \tagl{1.36}\]

is the gradient of T. Note that \( \grad{T} \) is a vector quantity, with three components; it is the generalized derivative we have been looking for. \( \eqref{1.35} \) is the three-dimensional version of \( \eqref{1.33} \). Geometrical interpretation of the Gradient: Like any vector, the gradient has magnitude and direction. To determine its geometrical meaning, let’s rewrite the dot product using Eq. 1.1

\[\dd{T} = \grad{T} \cdot \dd{\vec{l}} = |\grad{T}| |\dd{\vec{l}}| \cos \theta \tagl{1.37}\]

where \( \theta \) is the angle between \( \grad{T} \) and \( \dd{\vec{l}} \). Now if we fix the magnitude \( |\dd{\vec{l}}| \) and search around in various directions, the maximum change in T evidently occurs when \( \theta = 0 \) (for then \( \cos \theta = 1 \)). That is, for a fixed distance, dT is greatest when I move in the same direction as \( \grad{T} \) . Thus:

The gradient \( \grad{T} \) points in the direction of maximum increase of the function T.

Moreover:

The magnitude \( | \grad{T} | \) gives the slope (rate of increase) along this maximal direction

Imagine you are standing on a hillside. Look all around you, and find the direction of steepest ascent. That is the direction of the gradient. Now measure the slope in that direction (rise over run). That is the magnitude of the gradient. (Here the function we’re talking about is the height of the hill, and the coordinates it depends on are positions-latitude and longitude, say. This function depends on only two variables, not three, but the geometrical meaning of the gradient is easier to grasp in two dimensions.) Notice from Eq. 1.37 that the direction of maximum descent is opposite to the direction of maximum ascent, while at right angles \( (\theta = 90^{\circ}) \) the slope is zero (the gradient is perpendicular to the contour lines). You can conceive of surfaces that do not have these properties, but they always have “kinks” in them, and correspond to non-differentiable functions.

What would it mean for the gradient to vanish? If \( \grad{T} = 0 \) at (x, y, z), then \( \dd{T} = 0 \) for small displacements about the point (x, y, z). This is, then, a stationary point of the function T(x, y, z). It could be a maximum (a summit), a minimum (a valley), a saddle point (a pass), or a “shoulder.” This is analogous to the situation for functions of one variable, where a vanishing derivative signals a maximum, a minimum, or an inflection. In particular, if you want to locate the extrema of a function of three variables, set its gradient equal to zero.

Example 1.3 #

Find the gradient of \( r = \sqrt{x^2 + y^2 + z^2} \) (the magnitude of the position vector)
\[\begin{aligned} \grad{r} & = \pdv{r}{x} \vu{x} + \pdv{r}{y} \vu{y} + \pdv{r}{z} \vu{z} \\ & = \frac{1}{2} \frac{2x}{\sqrt{x^2 + y^2 + z^2}}\vu{x} + \frac{1}{2} \frac{2y}{\sqrt{x^2 + y^2 + z^2}}\vu{y} + \frac{1}{2} \frac{2z}{\sqrt{x^2 + y^2 + z^2}}\vu{z} \\ & = \frac{x \vu{x} + y \vu{y} + z \vu{z}}{\sqrt{x^2 + y^2 + z^2}} = \frac{\vec{r}}{r} = \vu{r} \end{aligned}\]

Does this makes sense? Well, it says that the distance from the origin increases most rapidly in the radial direction, and that its rate of increase in that direction is 1… just what you’d expect.

1.2.3: The Del Operator #

The gradient has the formal appearance of a vector, \( \nabla \), “multiplying” a scalar T:

\[\grad{T} = \left( \vu{x} \pdv{}{x} + \vu{y} \pdv{}{y} + \vu{z} \pdv{}{z} \right) T \tagl{1.38}\]

(For once, I write the unit vectors to the left, just so no one will think that this means \( \pdv{\vu{x}}{x} \) and so on, which would be zero since the coordinate directions are constant.) The term in parentheses is called del:

\[\grad{} = \vu{x} \pdv{}{x} + \vu{y} \pdv{}{y} + \vu{z} \pdv{}{z} \tagl{1.39}\]

Of course, del is not a vector, in the usual sense. Indeed, it doesn’t mean much until we provide it with a function to act upon. Furthermore, it does not “multiply” T; rather, it is an instruction to differentiate what follows. To be precise, then, we say that \( \grad{} \) is a vector operator that acts upon T, not a vector that multiplies T.

With this qualification, though, \( \grad{} \) mimics the behavior of an ordinary vector in virtually every way; almost anything that can be done with other vectors can also be done with \( \grad{} \), if we merely translate “multiply” by “act upon.” So by all means take the vector appearance of \( \grad{} \) seriously: it is a marvelous piece of notational simplification, as you will appreciate if you ever consult Maxwell’s original work on electromagnetism, written without the benefit of \( \grad{} \).

Now, an ordinary vector \( \vec{A} \) can multiply in three ways:

  1. By a scalar a: \( \vec{A}a \)
  2. By a vector \( \vec{B} \), via the dot product: \( \vec{A} \cdot \vec{B} \)
  3. By a vector \( \vec{B} \), via the cross product: \( \vec{A} \cross \vec{B} \)

Correspondingly, there are three ways the operator \( \grad{} \) can act:

  1. On a scalar function T: \( \grad{T} \) (the gradient)
  2. On a vector function \( \vec{v} \), via the dot product: \( \div{\vec{v}} \) (the divergence)
  3. On a vector function \( \vec{v} \), via the cross product: \( \curl{\vec{v}} \) (the curl)

We have already discussed the gradient. In the following sections we examine the other two vector derivatives: divergence and curl.

1.2.4: The Divergence #

From the definition of \( \grad{} \) we construct the divergence:

\[\begin{aligned} \div{\vec{v}} & = \left( \vu{x} \pdv{}{x} + \vu{y} \pdv{}{y} + \vu{z} \pdv{}{z} \right) \cdot (v_x \vu{x} + v_y \vu{y} + v_z \vu{z}) \\ & = \pdv{v_x}{x} + \pdv{v_y}{y} + \pdv{v_z}{z} \tagl{1.40} \end{aligned}\]

Observe that the divergence of a vector function is itself a scalar.

Geometrical interpretation: The name divergence is well chosen, for \( \div{\vec{v}} \) is a measure of how much the vector \( \vec{v} \) spreads out (diverges) from the point in question. For example, the vector function in Fig. 1.18a has a large (positive) divergence (if the arrows pointed in, it would be a negative divergence), the function in Fig. 1.18b has zero divergence, and the function in Fig. 1.18c again has a positive divergence. (Please understand that \( \vec{v} \) here is a function - there’s a different vector associated with every point in space. In the diagrams, of course, I can only draw the arrows at a few representative locations.)

Imagine standing at the edge of a pond. Sprinkle some sawdust or pine needles on the surface. If the material spreads out, then you dropped it at a point of positive divergence; if it collects together, you dropped it at a point of negative divergence. (The vector function \( \vec{v} \) in this model is the velocity of the water at the surface - this is a two-dimensional example, but it helps give one a “feel” for what the divergence means. A point of positive divergence is a source, or “faucet”; a point of negative divergence is a sink, or “drain.”)

Figure 1.18

Example 1.4 #

Suppose the functions in Fig 1.18 are \( \vec{v_a} = \vec{r} = x \vu{x} + y \vu{y} + z \vu{z} \), \( \vec{v_b} = \vu{z} \), and \( \vec{v_c} = z\vu{z} \). Calculate their divergences.
\[\div{\vec{v_a}} = \pdv{}{x} (x) + \pdv{}{y} (x) + \pdv{}{z} (z) = 1 + 1 + 1 = 3\]

As anticipated, this function has a positive divergence.

\[\div{\vec{v_b}} = \pdv{}{x} (0) + \pdv{}{0} (x) + \pdv{}{z} (1) = 0 + 0 + 0 = 0\]

as expected.

\[\div{\vec{v_b}} = \pdv{}{x} (0) + \pdv{}{0} (x) + \pdv{}{z} (z) = 0 + 0 + 1 = 1\]

1.2.5: The Curl #

From the definition of \( \grad{} \) we construct the curl:

\[\begin{aligned} \curl{\vec{v}} & = \begin{vmatrix} \vu{x} & \vu{y} & \vu{z} \\ \pdv{}{x} & \pdv{}{y} & \pdv{}{z} \\ v_x & v_y & v_z \end{vmatrix} \\ & + \vu{x} \left( \pdv{v_z}{y} - \pdv{v_y}{z} \right) + \vu{y} \left( \pdv{v_x}{z} - \pdv{v_z}{x} \right) + \vu{z} \left( \pdv{v_y}{x} - \pdv{v_x}{y} \right) \tagl{1.41} \end{aligned}\]

Figure 1.19

Notice that the curl of a vector function is, like any cross product, a vector.

Geometrical Interpretation: The name curl is also well chosen, for \( \curl{\vec{v}} \) is a measure of how much the vector \( \vec{v} \) swirls around the point in question. Thus the three functions in Fig. 1.18 all have zero curl (as you can easily check for yourself), whereas the functions in Fig. 1.19 have a substantial curl, pointing in the z direction, as the natural right-hand rule would suggest. Imagine (again) you are standing at the edge of a pond. Float a small paddlewheel (a cork with toothpicks pointing out radially would do); if it starts to rotate, then you placed it at a point of nonzero curl. A whirlpool would be a region of large curl.

Example 1.5 #

Suppose the function sketched in Fig 1.19a is \( \vec{v_a} = -y \vu{x} + x \vu{y} \), and that in Fig 1.19b is \( \vec{v_b} = x \vu{y} \). Calculate their curls.
\[\curl{\vec{v_a}} = \begin{vmatrix} \vu{x} & \vu{y} & \vu{z} \\ \pdv{}{x} & \pdv{}{y} & \pdv{}{z} \\ -y & x & 0 \end{vmatrix} = 2 \vu{z}\]

and

\[\curl{\vec{v_b}} = \begin{vmatrix} \vu{x} & \vu{y} & \vu{z} \\ \pdv{}{x} & \pdv{}{y} & \pdv{}{z} \\ 0 & x & 0 \end{vmatrix} = \vu{z}\]

As expected, these curls point in the +z direction. (Incidentally, they both have zero divergence, as you might guess from the pictures: nothing is “spreading out”… it just “swirls around.”)

1.2.6: Product Rules #

The calculation of ordinary derivatives is facilitated by a number of rules, such as the sum rule

\[\dv{}{x} (f + g) = \dv{f}{x} + \dv{g}{x}\]

the rule for multiplying a constant:

\[\dv{}{x} (kf) = k \dv{f}{x}\]

the product rule:

\[\dv{}{x}(fg) = f \dv{g}{x} + g \dv{f}{x}\]

and the quotient rule

\[\dv{}{x} \left( \frac{f}{g} \right) = \frac{g \dv{f}{x} - f \dv{g}{x}}{g^2} \]

Similar relations hold for the vector derivatives. Thus,

\[\grad{(f + g)} = \grad{f} + \grad{g}, \qquad \div{(\vec{A} + \vec{B})} = (\div{\vec{A}}) + (\div{\vec{B}})\] \[\curl{(\vec{A} + \vec{B})} = (\curl{\vec{A}}) + (\curl{\vec{B}})\]

and

\[\grad{(kf)} = k \grad f, \quad \div{(k\vec{A})} = k (\div{\vec{A}}), \quad \curl{(k\vec{A})} = k(\curl{\vec{A}})\]

as you can check for yourself. The product rules are not quite so simple. There are two ways to construct a scalar as the product of two functions:

\[fg \quad \text{(product of two scalar functions),} \\ \vec{A} \cdot \vec{B} \quad \text{(dot product of two vector functions),}\]

and two ways to make a vector:

\[f \vec{A} \quad \text{(scalar times vector),} \\ \vec{A} \cross \vec{B} \quad \text{(cross product of two vectors).}\]

Accordingly, there are six product rules, two for gradients:

\[\grad{(fg)} = f \grad{g} + g \grad{f} \tag{i}\] \[\grad( \vec{A} \cdot \vec{B}) = \vec{A} \cross (\curl{\vec{B}}) + \vec{B} \cross (\curl{\vec{A}}) + (\vec{A} \cdot \grad{})\vec{B} + (\vec{B} \cdot \grad{}) \vec{A} \tag{ii}\]

two for divergences:

\[\div{(f\vec{A})} = f(\div{\vec{A}}) + \vec{A} \cdot (\grad{f}) \tag{iii}\] \[\div{(\vec{A} \cross \vec{B})} = \vec{B} \cdot (\curl{\vec{A}}) - \vec{A} \cdot (\curl{\vec{B}}) \tag{iv}\]

and two for curls:

\[\curl{(f\vec{A})} = f(\curl{\vec{A}}) - \vec{A} \cross (\grad{f}) \tag{v}\] \[\curl{(\vec{A} \cross \vec{B})} = (\vec{B} \cdot \grad{})\vec{A} - (\vec{A} \cdot \grad{}) \vec{B} + \vec{A}(\div{\vec{B}}) - \vec{B}(\div{\vec{A}}) \tag{vi}\]

If there’s anything in this chapter that’s worth memorizing, it is this set of identities. The proofs come straight from the product rule for ordinary derivatives. For instance,

\[\begin{aligned} \div{(f\vec{A})} & = \pdv{}{x} (f A_x) + \pdv{}{y} (fA_y) + \pdv{}{z}(f A_z) \\ & = \left( \pdv{f}{x} A_x + f \pdv{A_x}{x} \right) + \left( \pdv{f}{y} A_y + f \pdv{A_y}{y} \right) + \left( \pdv{f}{z}A_z + f \pdv{A_z}{z} \right) \\ & = (\grad{f}) \cdot \vec{A} + f(\div{\vec{A}}) \end{aligned}\]

It is also possible to formulate three quotient rules:

\[\grad \left( \frac{f}{g} \right) = \frac{g \grad{f} - f \grad{g}}{g^2} \] \[\div{\left( \frac{\vec{A}}{g} \right)} = \frac{g(\div{\vec{A}}) - \vec{A} \cdot (\grad{g})}{g^2} \] \[\curl \left( \frac{\vec{A}}{g} \right) = \frac{g(\curl{\vec{A}}) + \vec{A} \cross (\grad{g})}{g^2} \]

However, since these can be obtained quickly from the corresponding product rules, there is no point in listing them separately.

1.2.7: Second Derivatives #

The gradient, the divergence, and the curl are the only first derivatives we can make with \( \grad \); by applying \( \grad \) twice, we can construct five species of second derivatives. The gradient \( \grad{T} \) is a vector, so we can take the divergence and curl of it:

  1. Divergence of gradient: \( \div (\grad{T}) \)
  2. Curl of gradient: \( \curl (\grad{T}) \)

The divergence \( \div{\vec{v}} \) is a scalar - all we can do is take its gradient:

  1. Gradient of divergence: \( \grad (\div{\vec{v}}) \)

The curl \( \curl \vec{v} \) is a vector, so we can take its divergence and curl:

  1. Divergence of curl: \( \div (\curl \vec{v}) \)
  2. Curl of curl: \( \curl (\curl \vec{v}) \)

This exhausts the possibilities, and in fact not all of them give anything new. Let’s consider them one at a time:

\[\begin{aligned} \div (\grad{T}) & = \left( \vu{x} \pdv{}{x} + \vu{y} \pdv{}{y} + \vu{z} \pdv{}{z} \right) \cdot \left( \pdv{T}{x} \vu{x} + \pdv{T}{y} \vu{y} + \pdv{T}{z} \vu{z} \right) \\ & = \frac{\partial ^2 T}{\partial x^2} + \frac{\partial ^2 T}{\partial y^2} + \frac{\partial ^2 T}{\partial z^2} \end{aligned} \tagl{1.42}\]

This object, which we write as \( \laplacian T \) for short, is called the Laplacian of T; we shall be studying it in great detail later on. Notice that the Laplacian of a scalar T is a scalar. Occasionally we shall speak of the laplacian of a vector, \( \laplacian \vec{v} \). By this we mean a vector quantity whose x-component is the Laplacian of \( v_x \), and so in:

\[\laplacian \vec{v} \equiv (\laplacian v_x) \vu{x} + (\laplacian v_y) \vu{y} + (\laplacian v_z) \vu{z} \tagl{1.43}\]

This is nothing more than a convenient extension of the meaning of \( \laplacian \).

The curl of a gradient is always zero:

\[\curl (\grad{T}) = 0 \tagl{1.44}\]

This is an important fact, which we shall use repeatedly; you can easily prove it from the definition of \( \grad \), hinging on the equality of cross-derivatives:

\[\pdv{}{x} \left( \pdv{T}{y} \right) = \pdv{}{y} \left( \pdv{T}{x} \right) \tagl{1.45}\]

\( \grad(\div{\vec{v}}) \) seldom occurs in physical applications, and it has not been given any special name of its own - it’s just the gradient of the divergence. Notice that \( \grad (\div \vec{v}) \) is not the same as the Laplacian of a vector: \( \laplacian \vec{v} = (\div \grad) \vec{v} \neq \grad( \div \vec{v}) \).

The divergence of a curl, like the curl of a gradient, is always zero:

\[\div (\curl \vec{v}) = 0 \tagl{1.46}\]

You can prove this for yourself.

As you can check from the definition of \( \grad \):

\[\curl (\curl \vec{v}) = \grad(\div \vec{v}) - \laplacian \vec{v} \tagl{1.47}\]

So curl-of-curl gives nothing new; the first term is just gradient of divergence, and the second is the Laplacian (of a vector). (In fact, this is often used to define the Laplacian of a vector, in preference to \( \eqref{1.43} \) which makes explicit reference to Cartesian coordinates.)

Really, then, there are just two kinds of second derivatives: the Laplacian (which is of fundamental importance) and the gradient-of-divergence (which we seldom encounter). We could go through a similar ritual to work out third derivatives, but fortunately second derivatives suffice for practically all physical applications.

A final word on vector differential calculus: It all flows from the operator \( \grad \), and from taking seriously its vectorial character. Even if you remembered only the definition of \( \grad \), you could easily reconstruct all the rest.