$$ \newcommand{\bs}[1]{\boldsymbol{#1}} $$

MEK2200 Notes on notation

Mikael Mortensen (mikaem at math.uio.no)

Department of Mathematics, University of Oslo.

Aug 22, 2018


Summary.

The literature in mechanics is full of different and sometimes confusing notation. Once in a while, the notations are mixed and errors may occur if one is not careful. This is particularly true when we enter into the world of second-order tensors (matrices). This note covers some of the basic material on notation in an attempt to bring some clarity.

Fields

In physics we deal with several different fields, usually defined in three-dimensional Euclidean space \( \mathcal{R}^3 \). A scalar field is recognized simply by its magnitude, that may vary throughout space. Examples of scalar fields are temperature and density. A vector is a field, or physical quantity, that has both magnitude and direction. Examples are the fluid velocity (one velocity component for each space dimension), deformation, electric or magnetic fields. Even though it has direction, a vector is independent of coordinate system, and many different coordinate systems can be used to describe the same vector. A tensor is often used when speaking of fields with higher order than vectors, even though a vector can be considered a tensor of order one, and a scalar a tensor of order 0. In general a tensor is a field whose components are represented through a number of indices (the number of indices determines its order), and that is invariant in form after a transformation of the coordinate system.

Introducing a coordinate system through unit vectors \( \boldsymbol{i}_1, \boldsymbol{i}_2 \) and \( \boldsymbol{i}_3 \), a vector field may be represented as \begin{equation} \boldsymbol{u} = u_1 \boldsymbol{i}_1 + u_2\boldsymbol{i}_2 + u_3 \boldsymbol{i}_3 = \sum_{i=1}^3 u_i \boldsymbol{i}_i. \label{eq:u_vec2} \end{equation}

A Cartesian coordinate system is often define with unit vectors \( \bs{i}_1 = (1, 0, 0), \bs{i}_1=(0, 1, 0) \) and \( \bs{i}_3=(0, 0, 1) \), with origin \( (0,0,0) \) defined as where the three vectors meet. If nothing is stated explicitly about the coordinate system, then it is safe to assume that this is the coordinate system in use.

A vector is sometimes denoted simply as \begin{equation} \boldsymbol{u} = (u_1, u_2, u_3), \label{eq:u_vec} \end{equation}

where \( (u_1, u_2, u_3) \) are the three components of the vector \( \boldsymbol{u} \) in the coordinate system that is in use. The components can always be found from \begin{equation} u_i = \bs{u} \cdot \bs{i}_i. \label{_auto1} \end{equation}

We note that \( \boldsymbol{u} \), with a boldface font, is not the only way to represent a vector. Equally common are \( \overrightarrow{u}, \underline{u}, \overline{u} \), sometimes also in a boldface font. We will only make use of the bold type.

The notation used in \eqref{eq:u_vec}, boldface without explicitly stating the unit vectors, is usually referred to as vector notation, but sometimes it is also called symbolic, absolute, invariant or direct notation. We will here stick with vector notation.

In the notation on the right hand side of \eqref{eq:u_vec2} the unit (or basis) vectors \( \boldsymbol{i}_1, \boldsymbol{i}_2 \) and \( \boldsymbol{i}_3 \) are explicitly included as well as the components, and this notation is often referred to as component, or basis vector form. The expressions on basis vector form are rather long, though, and this has motivated a simplified version that is using the summation convention \begin{equation} \boldsymbol{u} = u_i \boldsymbol{i}_i. \label{_auto2} \end{equation}

Here the two repeated indices \( i \) indicate that the expression must be summed over the entire length of the index \( i \), and it is often referred to as Einstein's summation convention. The result of summing the index \( i \) from \( 1 \) to \( 3 \) is of course exactly the same as seen in Eq. \eqref{eq:u_vec2}.

Using the basis vector form without explicitly stating the basis vectors leads to the very popular index notation. With the index notation (or suffix, indical or subscript notation) expressions are written not for the vector, but for a vector component, and we use, as before, simply \( u_i \) to refer to component i of the vector \( \boldsymbol{u} \). The index form is used with the summation convention, such that, for example, the dot product becomes \begin{equation*} \bs{u} \cdot \bs{v} = u_i v_i. \end{equation*}

Papers or books that are primarily using index notation, usually also use the additional vector notation to represent the entire vector, when this is deemed necessary.

Note also that the index notation without explicit use of basis vectors is only valid for a Cartesian coordinate system. The Cartesian unit or basis vectors obviously satisfy \begin{equation} \bs{i}_i \cdot \bs{i}_j = \begin{cases} 1 \quad \text{if } i = j \\ 0 \quad \text{if } i \ne j \end{cases} \label{_auto3} \end{equation}

and are for this reason referred to as an orthonormal basis.

Operations on fields that do not require derivatives

The most common operations on fields that do not require derivatives are the scalar product (also called inner or dot product), outer product and cross product. We will here show how they can all be written with different types of notation.

Inner product

The inner product between two vectors \( \boldsymbol{u} \) and \( \boldsymbol{v} \) equals a scalar (\( \alpha \)) and is given in vector notation as \begin{align} \alpha &= \boldsymbol{u} \cdot \boldsymbol{v}, \label{_auto4}\\ &= (u_1, u_2, u_3) \cdot (v_1, v_2, v_3), \label{_auto5}\\ &= u_1v_1 + u_2 v_2 + u_3v_3. \label{_auto6} \end{align}

With index notation we obtain the same result simply by writing \begin{equation} \alpha = u_i v_i. \label{_auto7} \end{equation}

To show that this agrees with basis vector notation, we get \begin{align} \alpha &= \sum_{i=1}^3 u_i\boldsymbol{i}_i \cdot \sum_{j=1}^3 v_j \boldsymbol{i}_j, \label{_auto8}\\ &= \sum_{i=1}^3\sum_{j=1}^3 u_iv_j \boldsymbol{i}_i \cdot \boldsymbol{i}_j, \label{_auto9}\\ &= u_1v_1 + u_2 v_2 + u_3v_3. \label{_auto10} \end{align}

Here the last equality follows from retaining only the non-zero terms of the double summation. That is, only the terms with \( i=j \) are nonzero because \( \boldsymbol{i}_i \cdot \boldsymbol{i}_j = 1 \) if and only if \( i=j \), whereas it is zero if \( i \ne j \).

Cross product

The cross product between two vectors \( \boldsymbol{u} \) and \( \boldsymbol{v} \) is a new vector \( \boldsymbol{a} \) (it is actually a pseudovector) \begin{equation} \boldsymbol{a} = \boldsymbol{u} \times \boldsymbol{v}. \label{_auto11} \end{equation}

The cross product between \( \boldsymbol{u} \) and \( \boldsymbol{v} \) is a vector with direction normal to the plane spanned by \( \boldsymbol{u} \) and \( \boldsymbol{v} \). However, the actual direction also depends on the chosen handedness of the coordinate system. We use a right-handed coordinate system, meaning that if you point your right thumb in the direction of \( \boldsymbol{u} \) and your right index finger in the direction of \( \boldsymbol{v} \), then your remaining three fingers, when bended an angle of \( 90^0 \), will point in the defined normal direction of the plane spanned by \( \boldsymbol{u} \) and \( \boldsymbol{v} \). Right- and left-handed coordinate systems are illustrated in the figure below





The cross product is defined as \begin{equation} \boldsymbol{u} \times \boldsymbol{v} = \lVert \boldsymbol{u}\rVert \, \lVert \boldsymbol{v} \rVert \, \sin (\theta) \, \boldsymbol{n}, \label{_auto12} \end{equation} where \( \lVert \boldsymbol{u} \rVert \) represents the magnitude of \( \boldsymbol{u} \) and \( \theta \) is the angle between the two vectors in the plane they span. The vector \( \boldsymbol{n} \) is the normal vector to the same plane, in the direction determined by the handedness.

The cross product is often defined as the determinant of the matrix \begin{equation} \boldsymbol{u} \times \boldsymbol{v} = \det \begin{vmatrix} \boldsymbol{i}_1 & \boldsymbol{i}_2 & \boldsymbol{i}_3 \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{vmatrix} \label{_auto13} \end{equation}

With index or basis vector notation the cross product may be defined using the Levi-Civita symbol \begin{equation} \varepsilon_{ijk} = \begin{cases} +1 &\text{if} \,\, ijk \,\, \text{is in the sequence } 123123, \\ -1 &\text{if} \,\, ijk \,\, \text{is in the sequence } 321321, \\ 0 &\text{otherwise.} \end{cases} \label{_auto14} \end{equation}

The Levi-Civita symbol is a third-order tensor and also an isotropic tensor, meaning that it has the same value in any coordinate system. Another well-known isotropic tensor is the second-order Kronecker delta \begin{equation} \delta_{ij} = \begin{cases} 1 \quad &\text{if } i=j \\ 0 \quad &\text{otherwise}. \end{cases} \label{_auto15} \end{equation}

Using the Levi-Civita symbol the cross product may be written with basis vector notation as \begin{equation} \bs{a} = \varepsilon_{ijk} u_j v_k \bs{i}_i. \label{_auto16} \end{equation}

The final \( \bs{i}_i \) is redundant with index notation, where it is written simply as \begin{equation} a_i = \varepsilon_{ijk} u_j v_k, \label{_auto17} \end{equation}

where \( i \) is the only free index.

The outer product

The outer product (also called dyadic or tensor product) between vectors \( \bs{u} \) and \( \bs{v} \) creates a second-order tensor (or a dyadic) \( \bs{P} \), and is denoted using a wide range of different notations \begin{align} \bs{P} &= \bs{u} \otimes \bs{v} \label{eq:outer1}\\ \bs{P} &= \bs{u} \bs{v} \label{eq:outer2}\\ \bs{P} &= u_iv_j\bs{i}_i\bs{i}_j \label{eq:outer3}\\ \bs{P} &= u_iv_j\bs{i}_i \otimes \bs{i}_j \label{eq:outer4} \\ P_{ij} &= u_iv_j \label{_auto18}\\ \label{_auto19} \end{align}

Evidently, the simplest approach is to take the lack of a dot as representing an outer product, as seen in \eqref{eq:outer2}. The indices and unit vectors indicate the components location in a matrix. The first and second index, i.e., \( i \) and \( j \), represent the matrix's rows and columns, respectively: \begin{equation} P_{ij} = \begin{pmatrix} P_{11} & P_{12} & P_{13} \\ P_{21} & P_{22} & P_{23} \\ P_{31} & P_{32} & P_{33} \end{pmatrix} \label{_auto20} \end{equation}

But note that for basis vector notation \eqref{eq:outer4} the important matter for the location is the order of the outer product between basis vectors. Since \( \bs{i}_i \) is first, this means that index \( i \) is used to describe the rows, and since \( \bs{i}_j \) is second, this means that index \( j \) is used to describe coloumns.

Operations that require derivatives

The gradient of a scalar field

The gradient is the only operation that increases the order when applied to a tensor. When applied to a zero'th-order scalar, the result is a vector of order one. We define the gradient by considering infinitesimal changes in the field \( f \) along a direction vector \( \boldsymbol{dx} \) \begin{equation} df = \text{grad} \, f \cdot \boldsymbol{dx}. \label{eq:gradf} \end{equation}

In a Cartesian coordinate system the gradient can be written as \begin{align} \text{grad}\, f &= \nabla f , \label{_auto21}\\ \text{grad}\, f &= \bs{i}_i \frac{\partial f}{\partial x_i}, \label{eq:gradf_2}\\ (\text{grad}\, f)_i &= \frac{\partial f}{\partial x_i}. \label{eq:gradf_3} \\ \label{_auto22} \end{align} Note that out of these three different ways of writing the gradient of a scalar, only the first, \( \text{grad} \, f = \nabla f \), is valid for a general coordinate system. The other two are valid for a Cartesian coordinate system. Equation \eqref{eq:gradf_2} requires additional scaling factors to be valid for curvilinear coordinate systems, whereas the index notation in \eqref{eq:gradf_3} never can be used for any other coordinate system than a Cartesian.

With these notations for the gradient we have for Eq. \eqref{eq:gradf} \begin{align} df &= \nabla f \cdot \bs{dx}, \label{_auto23}\\ df &= \frac{\partial f}{\partial x_i} dx_i, \label{_auto24}\\ df &= \bs{i}_i \frac{\partial f}{\partial x_i} \cdot \bs{i}_j dx_j = \frac{\partial f}{\partial x_i}dx_j \bs{i}_i \cdot \bs{i}_j = \frac{\partial f}{\partial x_i} dx_i. \label{_auto25} \end{align}

The \( \nabla \) operator

The nabla operator \( \nabla \) plays a central role in the mathematical description of fundamental theorems, like the principles for conservation of mass, and the balance of momentum.

There are basically two directions taken for the mathematical nabla operator. One uses it straightforward as a vector, and the other uses it as an operator. This will become evident in Secs. The gradient of a vector and The divergence of a tensor when nabla is used to describe the gradient of a vector and the divergence of a tensor. For the gradient of a scalar or the divergence of a vector, the two different approaches are both the same.

When the nabla operator is interpreted as the vector, we can write it as \begin{equation} \nabla = \bs{i}_i\frac{\partial }{\partial x_i}, \label{eq:nabla_1} \end{equation} but note that this is only valid for Cartesian coordinate systems.

By defining nabla as an operator, we apply it as \eqref{eq:nabla_1} to scalars, but as \begin{equation} \nabla = \frac{\partial (\cdot)}{\partial x_i} \otimes \bs{i}_i, \label{eq:nabla_grad} \end{equation} when applied to higher order tensors \( (\cdot) \). With this latter approach, the spatial derivative becomes the last index in the resulting tensor, whereas with the \eqref{eq:nabla_1} approach, the spatial derivative becomes the first index of the resulting tensor. For the gradient of a scalar the first and last index is the same, so there is no difference. The section The gradient of a vector shows \eqref{eq:nabla_grad} applied to a vector. Note that it doesn't matter that index \( i \) is used in \eqref{eq:nabla_grad}, it is still the last index of the resulting tensor due to the outer product. That is, for tensor \( \partial u_j /\partial x_i \bs{i}_{j} \otimes \bs{i}_{i} \), index \( j \) represents row and index \( i \) column due to the placement in the outer product \( \bs{i}_j \otimes \bs{i}_i \).

The divergence of a vector field

The divergence of a vector field \( \bs{u} \) is defined as the surface integral of \( \bs{u} \cdot \bs{n} \) over a surface surrounding a point P \begin{equation} \text{div}\, \bs{u} = \lim_{\delta V \rightarrow 0} \frac{1}{\delta V}\oint_{\delta S} \bs{u} \cdot \bs{n} ds, \label{eq:divu} \end{equation} where \( \delta V \) is a small volume enclosing P, with surface \( \delta S \). In Cartesian coordinates this corresponds to the dot product between the vector nabla and the vector \( \bs{u} \), which can be written as either one of \begin{equation} \nabla \cdot \bs{u}, \quad \frac{\partial u_i}{\partial x_i}, \quad \bs{i}_i \frac{\partial }{\partial x_i} \cdot u_j \bs{i}_j. \label{_auto26} \end{equation}

The gradient of a vector

The gradient of a vector is a second order tensor. This is where much confusion arise in the literature, because a second order tensor \( \bs{P} \) can be transposed, and in general \begin{equation} P_{ij} \ne P_{ji} \quad \text{or} \quad \bs{P} \ne \bs{P}^T, \label{_auto27} \end{equation} for a non-symmetric tensor. Perhaps more evident \begin{equation} \frac{\partial u_i}{\partial x_j} \ne \frac{\partial u_j}{\partial x_i}. \label{_auto28} \end{equation}

If we define the gradient of a vector as right multiplication \begin{equation} \label{eq:gradu} d\bs{u} = \text{grad} \, \bs{u} \cdot \boldsymbol{dx}, \end{equation}

then the spatial derivative must be represented in the second index (the columns) of the tensor, and we get after applying \eqref{eq:nabla_grad} \begin{align} \text{grad}\, \bs{u} &= \frac{\partial u_i \bs{i}_i}{\partial x_j} \otimes \bs{i}_j, \label{_auto29}\\ &= \frac{\partial u_i}{\partial x_j} \bs{i}_i \otimes \bs{i}_j, \label{eq:gradu_2} \end{align}

or similarily \begin{equation} \text{grad}\, \bs{u} = \frac{\partial u_i}{\partial x_j}. \label{_auto30} \end{equation}

Now, some authors like to identify \( \nabla \) with this gradient operator, such that \begin{equation} \text{grad} \, \bs{u} = \nabla \bs{u} = \frac{\partial u_i}{\partial x_j}. \label{_auto31} \end{equation}

However, some authors do not identify the gradient to be the definition in \eqref{eq:gradu}, and simply work with the nabla operator as a regular vector. In this case the outer product of the nabla operator \( \nabla \) and the vector \( \bs{u} \) is \begin{align} \nabla \otimes \bs{u} &= \bs{i}_i \frac{\partial }{\partial x_i} \otimes u_j \bs{i}_j \label{_auto32}\\ & = \frac{\partial u_j}{\partial x_i} \bs{i}_i \otimes \bs{i}_j, \label{eq:graduT} \end{align} which is the transpose of Eq. \eqref{eq:gradu_2}. Note that Eq. \eqref{eq:graduT} may also be written as \begin{equation} \nabla \bs{u} \quad \text{or} \quad \frac{\partial u_j}{\partial x_i}, \label{_auto33} \end{equation}

and there is no way of knowing whether \( \nabla \bs{u} \) represents \( \partial u_i /\partial x_j \) or \( \partial u_j /\partial x_i \) unless the author has clarified which notation that is in use. Note that either style of writing is correct in itself, and as long as it is made clear which definitions are being used, then there can be no confusion.

One term that is often incorrect in academic papers is the nonlinear convection vector \begin{equation} u_j \frac{\partial u_i}{\partial x_j}. \label{_auto34} \end{equation}

Here, if we accept the definition with \( \nabla \bs{u} = \partial u_j / \partial x_i \), then we can write this as \begin{equation} \bs{u} \cdot \nabla \bs{u}. \label{_auto35} \end{equation}

However, if we choose to use the definition \( \nabla \bs{u} = \partial u_i / \partial x_j \), then one must write \begin{equation} \nabla \bs{u} \cdot \bs{u}, \label{_auto36} \end{equation}

which is very rarely seen in any academic papers or books.

One common way to avoid any confusion is to define the convection operator \( \bs{u} \cdot \nabla \) \begin{equation} \bs{u} \cdot \nabla = u_i \frac{\partial }{\partial x_i}, \label{_auto37} \end{equation}

and then write the nonlinear convection term as \begin{equation} (\bs{u} \cdot \nabla) \,\bs{u}. \label{_auto38} \end{equation}

This is the preferred notation used by the current author.

The divergence of a tensor

Similarly to the gradient, there may be confusion when taking the divergence of a tensor. According to the definition in Eq. \eqref{eq:divu}, we have \begin{equation} \text{div}\, \bs{P} = \frac{\partial P_{ij}}{\partial x_j}. \label{eq:divP} \end{equation}

However, if we identify the nabla operator as a vector and not an operator, then we have that the dot product of the nabla vector and \( P \) \begin{equation} \nabla \cdot \bs{P} = \frac{\partial P_{ij}}{\partial x_i}, \label{_auto39} \end{equation}

and this is different from Eq. \eqref{eq:divP} since summation is over the first index of \( \bs{P} \). On the other hand, if \( \nabla \cdot \) is seen as an operator acting on \( \bs{P} \) one may get \begin{align} \nabla \cdot \bs{P} &= \frac{\partial P_{ij} \bs{i}_i \otimes \bs{i}_j}{\partial x_k} \cdot \bs{i}_k, \label{_auto40}\\ &= \frac{\partial P_{ij} }{\partial x_j}. \label{_auto41} \end{align}

Both definitions are equally valid and ok, as long as it is made perfectly clear which one is being used. However, only for the latter definition does \( \nabla \cdot \) agree with the divergence operator \eqref{eq:divu}.