Category Archives: Math

Midpoints, midpoints, everywhere!

I didn’t encounter the Quadrilateral Midpoint Theorem (QMT) until I had been teaching a few years.  Following is a minor variation on my approach to the QMT this year plus a fun way I leveraged the result to introduce similarity.

In case you haven’t heard of it, the surprisingly lovely QMT says that if you connect, in order, the midpoints of the four sides of a quadrilateral–any quadrilateral–even if the quadrilateral is concave or if its sides cross–the resulting figure will always be a parallelogram.




This is a cool and easy property to explore on any dynamic geometry software package (GeoGebra, TI-Nspire, Cabri, …).

SKETCH OF THE TRADITIONAL PROOF:  The proof is often established through triangle similarity:  Whenever you connect the midpoints of two sides of a triangle, the resulting segment will be parallel to and half the length of the triangle’s third side.  Draw either diagonal in the quadrilateral to create two triangles.  Connecting the midpoints of the other two sides of each triangle creates two congruent parallel sides, so the quadrilateral connecting all four midpoints must be a parallelogram.

NEW APPROACH THIS YEAR:  I hadn’t yet led my class into similarity, but having just introduced coordinate proofs, I tried an approach I’d never used before.  I assigned a coordinate proof of the QMT.  I knew the traditional approach existed, but I wanted them to practice their new technique.  From a lab in December, they already knew the result of the QMT, but they hadn’t proved it.

PART I:  Let quadrilateral ABCD be defined by the points , A=(a,b), B=(c,d), C=(e,f),  and D=(g,h).  There are several ways to prove that the midpoints of ABCD are the vertices of a parallelogram.  Provide one such coordinate proof.

All groups quickly established the midpoints of the four sides:  AB_{mid}=\left( \frac{a+c}{2},\frac{b+d}{2} \right)BC_{mid}=\left( \frac{c+e}{2},\frac{d+f}{2} \right)CD_{mid}=\left( \frac{e+g}{2},\frac{f+h}{2} \right), and DA_{mid}=\left( \frac{g+a}{2},\frac{h+b}{2} \right).  From there, my students took three approaches to the final proof, each relying on a different sufficiency condition for parallelograms.

The most common was to show that opposite sides were parallel.  \displaystyle slope \left( AB_{mid} \text{ to } BC_{mid} \right) = \frac{\frac{a-e}{2}}{\frac{b-f}{2}}=\frac{a-e}{b-f} and \displaystyle slope \left( CD_{mid} \text{ to } DA_{mid} \right) =\frac{a-e}{b-f}, making those two midpoint segments parallel.  Likewise, \displaystyle slope \left( BC_{mid} \text{ to } CD_{mid} \right) = \displaystyle slope \left( DA_{mid} \text{ to } AB_{mid} \right) = \frac{c-g}{d-h}, proving the other opposite side pair also was parallel.  With both pairs of opposite sides parallel, the midpoint quadrilateral was necessarily a parallelogram.

I had two groups leverage the fact that the diagonals of parallelograms were mutually bisecting.    \displaystyle midpoint \left( AB_{mid} \text{ to } CD_{mid} \right) = \left( \frac{a+c+e+g}{4},\frac{b+d+f+h}{4}\right) = = midpoint \left( BC_{mid} \text{ to } DA_{mid} \right).  QED.

One student even proved that opposite sides were congruent.

While it was not readily available for my students this year, I can imagine allowing CAS for these manipulations if I use this activity in the future.

EXTENDING THE QMT TO SIMILARITY:  For the next stage, I asked my students to explains what happens when the QMT is applied to degenerate quadrilaterals.

PART II:  You could think of triangles as being degenerate quadrilaterals when two quadrilateral vertices coincide to make one side of the quadrilateral have side length 0.  Apply this to generic quadrilateral ABCD from above where points A and D coincide to create triangle BCD.  Use this to explain how the segment connecting the midpoints of any two sides of a triangle is related to the third side of the triangle.

I encourage you to construct this using a dynamic geometry package, but here’s the result.



Heres a brief video showing the quadrilateral going degenerate.

Notice the parallelogram still exists and forms two midpoint segments on the triangle (degenerate quadrilateral).  By parallelogram properties, each of these segments is parallel and congruent to the opposite side of the parallelogram, making them parallel to and half the length of the opposite side of the triangle.

CONCLUSION:  I think it critical to teach in a way that draws connections between ideas and units. This exercise made a lovely transition from quadrilaterals through coordinate proofs to the triangle midpoint theorem.


Base-x Numbers and Infinite Series

In my previous post, I explored what happened when you converted a polynomial from its variable form into a base-x numerical form.  That is, what are the computational implications when polynomial 3x^3-11x^2+2 is represented by the base-x number 3(-11)02_x, where the parentheses are used to hold the base-x digit, -11, for the second power of x?  

So far, I’ve explored only the Natural number equivalents of base-x numbers.  In this post, I explore what happens when you allow division to extend base-x numbers into their Rational number counterparts.

Level 5–Infinite Series: 

Numbers can have decimals, so what’s the equivalence for base-x numbers?  For starters, I considered trying to get a “decimal” form of \displaystyle \frac{1}{x+2}.  It was “obvious” to me that 12_x won’t divide into 1_x.  There are too few “places”, so some form of decimals are required.  Employing division as described in my previous post somewhat like you would to determine the rational number decimals of \frac{1}{12} gives


Remember, the places are powers of x, so the decimal portion of \displaystyle \frac{1}{x+2} is 0.1(-2)4(-8)..._x, and it is equivalent to

\displaystyle 1x^{-1}-2x^{-2}+4x^{-3}-8x^{-4}+...=\frac{1}{x}-\frac{2}{x^2}+\frac{4}{x^3}-\frac{8}{x^4}+....

This can be seen as a geometric series with first term \displaystyle \frac{1}{x} and ratio \displaystyle r=\frac{-2}{x}.  It’s infinite sum is therefore \displaystyle \frac{\frac{1}{x}}{1-\frac{-2}{x}} which is equivalent to \displaystyle \frac{1}{x+2}, confirming the division computation.  Of course, as a geometric series, this is true only so long as \displaystyle |r|=\left | \frac{-2}{x} \right |<1, or 2<|x|.

I thought this was pretty cool, and it led to lots of other cool series.  For example, if x=8,you get \frac{1}{10}=\frac{1}{8}-\frac{2}{64}+\frac{4}{512}-....

Likewise, x=3 gives \frac{1}{5}=\frac{1}{3}-\frac{2}{9}+\frac{4}{27}-\frac{8}{81}+....

I found it quite interesting to have a “polynomial” defined with a rational expression.

Boundary Convergence:

As shown above, \displaystyle \frac{1}{x+2}=\frac{1}{x}-\frac{2}{x^2}+\frac{4}{x^3}-\frac{8}{x^4}+... only for |x|>2.  

At x=2, the series is obviously divergent, \displaystyle \frac{1}{4} \ne \frac{1}{2}-\frac{2}{4}+\frac{4}{8}-\frac{8}{16}+....

For x=-2, I got \displaystyle \frac{1}{0} = \frac{1}{-2}-\frac{2}{4}+\frac{4}{-8}-\frac{8}{16}+...=-\frac{1}{2}-\frac{1}{2}-\frac{1}{2}-\frac{1}{2}-... which is properly equivalent to -\infty as x \rightarrow -2 as defined by the convergence domain and the graphical behavior of \displaystyle y=\frac{1}{x+2} just to the left of x=-2.  Nice.


I did find it curious, though, that \displaystyle \frac{1}{x}-\frac{2}{x^2}+\frac{4}{x^3}-\frac{8}{x^4}+... is a solid approximation for \displaystyle \frac{1}{x+2} to the left of its vertical asymptote, but not for its rotationally symmetric right side.  I also thought it philosophically strange (even though I understand mathematically why it must be) that this series could approximate function behavior near a vertical asymptote, but not near the graph’s stable and flat portion near x=0.  What a curious, asymmetrical approximator.  

Maclaurin Series:

Some quick calculus gives the Maclaurin series for \displaystyle \frac{1}{x+2} :  \displaystyle \frac{1}{2}-\frac{x}{4}+\frac{x^2}{8}-\frac{x^3}{16}+..., a geometric series with first term \frac{1}{2} and ratio \frac{-x}{2}.  Interestingly, the ratio emerging from the Maclaurin series is the reciprocal of the ratio from the “rational polynomial” resulting from the base-x division above.  

As a geometric series, the interval of convergence is  \displaystyle |r|=\left | \frac{-x}{2} \right |<1, or |x|<2.  Excluding endpoint results, the Maclaurin interval is the complete Real number complement to the base-x series.  For the endpoints, x=-2 produces the right-side vertical asymptote divergence to + \infty that x=-2 did for the left side of the vertical asymptote in the base-x series.  Again, x=2 is divergent.

It’s lovely how these two series so completely complement each other to create clean approximations of \displaystyle \frac{1}{x+2} for all x \ne 2.

Other base-x “rational numbers”

Because any polynomial divided by another is absolutely equivalent to a base-x rational number and thereby a base-x decimal number, it will always be possible to create a “rational polynomial” using powers of \displaystyle \frac{1}{x} for non-zero denominators.  But, the decimal patterns of rational base-x numbers don’t apply in the same way as for Natural number bases.  Where \displaystyle \frac{1}{12} is guaranteed to have a repeating decimal pattern, the decimal form of \displaystyle \frac{1}{x+2}=\frac{1_x}{12_x}=0.1(-2)4(-8)..._x clearly will not repeat.  I’ve not explored the full potential of this, but it seems like another interesting field.  


Once number bases are understood, I’d argue that using base-x multiplication might be, and base-x division definitely is, a cleaner way to compute products and quotients, respectively, for polynomials.  

The base-x division algorithm clearly is accessible to Algebra II students, and even opens the doors to studying series approximations to functions long before calculus.

Is there a convenient way to use base-x numbers to represent horizontal translations as cleanly as polynomials?  How difficult would it be to work with a base-(x-h) number for a polynomial translated h units horizontally?

As a calculus extension, what would happen if you tried employing division of non-polynomials by replacing them with their Taylor series equivalents?  I’ve played a little with proving some trig identities using base-x polynomials from the Maclaurin series for sine and cosine.

What would happen if you tried to compute repeated fractions in base-x?  

It’s an open question from my perspective when decimal patterns might terminate or repeat when evaluating base-x rational numbers.  

I’d love to see someone out there give some of these questions a run!

Number Bases and Polynomials

About a month ago, I was working with our 5th grade math teacher to develop some extension activities for some students in an unleveled class.  The class was exploring place value, and I suggested that some might be ready to explore what happens when you allow the number base to be something other than 10.  A few students had some fun learning to use their basic four algorithms in other number bases, but I made an even deeper connection.

When writing something like 512 in expanded form (5\cdot 10^2+1\cdot 10^1+2\cdot 10^0), I realized that if the 10 was an x, I’d have a polynomial.  I’d recognized this before, but this time I wondered what would happen if I applied basic math algorithms to polynomials if I wrote them in a condensed numerical form, not their standard expanded form.  That is, could I do basic algebra on 5x^2+x+2 if I thought of it as 512_x–a base-x “number”?  (To avoid other confusion later, I read this as “five one two base-x“.)

Following are some examples I played with to convince myself how my new notation would work.  I’m not convinced that this will ever lead to anything, but following my “what ifs” all the way to infinite series was a blast.  Read on!

Level 1–Basic Addition:

If I wanted to add (3x+5)(2x^2+4x+1), I could think of it as 35_x+241_x and add the numbers “normally” to get 276_x or 2x^2+7x+6.  Notice that each power of x identifies a “place value” for its characteristic coefficient.

If I wanted to add 3x-7 to itself, I had to adapt my notation a touch.  The “units digit” is a negative number, but since the number base, x, is unknown (or variable), I ended up saying 3x-7=3(-7)_x.  The parentheses are used to contain multiple characters into a single place value.  Then, (3x-7)+(3x-7) becomes 3(-7)_x+3(-7)_x=6(-14)_x or 6x-14.  Notice the expanding parentheses containing the base-x units digit.

Level 2–Advanced Addition:

The last example also showed me that simple multiplication would work.  Adding 3x-7 to itself is equivalent to multiplying 2\cdot (3x-7).  In base-x, that is 2\cdot 3(-7)_x.  That’s easy!  Arguably, this might be even easier that doubling a number when the number base is known.  Without interactions between the coefficients of different place values, just double each digit to get 6(-14)_x=6x-14, as before.

What about (x^2+7)+(8x-9)?  That’s equivalent to 107_x+8(-9)_x.  While simple, I’ll solve this one by stacking.


and this is x^2+8x-2.  As with base-10 numbers, the use of 0 is needed to hold place values exactly as I needed a 0 to hold the x^1 place for x^2+7. Again, this could easily be accomplished without the number base conversion, but how much more can we push these boundaries?

Level 3–Multiplication & Powers:

Compute (8x-3)^2.  Stacking again and using a modification of the multiply-and-carry algorithm I learned in grade school, I got

Base2and this is equivalent to 64x^2-48x+9.

All other forms of polynomial multiplication work just fine, too.

From one perspective, all of this shifting to a variable number base could be seen as completely unnecessary.  We already have acceptably working algorithms for addition, subtraction, and multiplication.  But then, I really like how this approach completes the connection between numerical and polynomial arithmetic.  The rules of math don’t change just because you introduce variables.  For some, I’m convinced this might make a big difference in understanding.

I also like how easily this extends polynomial by polynomial multiplication far beyond the bland monomial and binomial products that proliferate in virtually all modern textbooks.  Also banished here is any need at all for banal FOIL techniques.

Level 4–Division:

What about x^2+x-6 divided by x+3? In base-x, that’s 11(-6)_x \div 13_x. Remembering that there is no place value carrying possible, I had to be a little careful when setting up my computation. Focusing only on the lead digits, 1 “goes into” 1 one time.  Multiplying the partial quotient by the divisor, writing the result below and subtracting gives


Then, 1 “goes into” -2 negative two times.  Multiplying and subtracting gives a remainder of 0.


thereby confirming that x+3 is a factor of x^2+x-6, and the other factor is the quotient, x-2.

Perhaps this could be used as an alternative to other polynomial division algorithms.  It is somewhat similar to the synthetic division technique, without its  significant limitations:  It is not limited to linear divisors with lead coefficients of one.

For (4x^3-5x^2+7) \div (2x^2-1), think 4(-5)07_x \div 20(-1)_x.  Stacking and dividing gives


So \displaystyle \frac{4x^3-5x^2+7}{2x^2-1}=2x-2.5+\frac{2x+4.5}{2x^2-1}.


From all I’ve been able to tell, converting polynomials to their base-x number equivalents enables you to perform all of the same arithmetic computations.  For division in particular, it seems this method might even be a bit easier.

In my next post, I push the exploration of these base-x numbers into infinite series.

Invariable Calculus Project

Here’s one of my favorite calculus projects.  I initially discovered it over 20 years ago in Cohen, et al’s superb Student Research Projects in Calculus.

For x>0, what is true about every triangle formed by the x- and y-axes and any tangent line to \displaystyle y=\frac{1}{x} ?  Prove thy claim.

I’d love to say nothing more than that, but I usually don’t.  The problem sounds vague in its statement, but is pretty simple to solve.  The hidden property is a delightful surprise.  I encourage you to try it out for yourself before reading further.

I just assigned the problem to one of my classes of seniors.  The class is a one-semester introduction to calculus for primarily students who’ve never been in honors and largely aren’t enamored by mathematics.  Most take the class to get an introduction to statistics (fall) and calculus (spring) before likely taking a course in one of these two in college and–for most–never taking another math course.  With that background in mind, I’ve probably scaffolded this iteration of the problem more than I should.  Here’s the assignment I gave them this week.

WARNING!  Partial Solution Alert!  Don’t read further if you want to solve the problem for yourself.

I typically use this project early in my introduction to derivatives and walk students through a little review and data gathering to help them discover the surprising hidden property.  While I don’t expect my students to do this, my default approach to geometric-type problems is to use a dynamic geometry package.  The animation below shows what happens when I varied the point of tangency while tracking the base, height, and area of the resulting triangle.

Well, I hope that animation screams something.  The x– and y-intercepts are the base and height, respectively, of a right triangle.  While those intercepts obviously vary as the point of tangency changes, the area of the triangle always seems to be 4.  It never changes!  If you’ve any geometry sense, something like that just shouldn’t happen.  So, is this a universal property, or is my animation misleading or limited in some way?  That’s a good question, and it requires proof.  Can you prove this apparent property about tangent lines to \displaystyle y=\frac{1}{x}?

FINAL SOLUTION ALERT!  Don’t read further if you want to prove this property for yourself.

For \displaystyle f(x)=\frac{1}{x}, \displaystyle\frac{d}{dx}\left(f(x)\right)=\frac{-1}{x^2}, so an equation for the tangent line to f at any point x=a is

\displaystyle \left(y-\frac{1}{a}\right)=\frac{-1}{a^2}\left(x-a\right).

The x-intercept of this generic line is \left(2a,0\right), and its y-intercept is \displaystyle \left(0,\frac{2}{a}\right).  Therefore, the area of the triangle formed by the x-and y-axes and the tangent line to f at any point x=a is

\displaystyle Area=\frac{1}{2}\cdot base\cdot height = \frac{1}{2}\cdot 2a\cdot\frac{2}{a}=2.

Cool!  The triangle’s area is always 2, completely independent of the point of tangency!


Are there any other functions that have a similar property, or is \displaystyle y=\frac{1}{x} alone in the mathematical universe for having constant area triangles?  Well, that’s a problem for another post.

Exponential Derivatives and Statistics

This post gives a different way I developed years ago to determine the form of the derivative of exponential functions, y=b^x.  At the end, I provide a copy of the document I use for this activity in my calculus classes just in case that’s helpful.  But before showing that, I walk you through my set-up and solution of the problem of finding exponential derivatives.


I use this lesson after my students have explored the definition of the derivative and have computed the algebraic derivatives of polynomial and power functions. They also have access to TI-nSpire CAS calculators.

The definition of the derivative is pretty simple for polynomials, but unfortunately, the definition of the derivative is not so simple to resolve for exponential functions.  I do not pretend to teach an analysis class, so I see my task as providing strong evidence–but not necessarily a watertight mathematical proof–for each derivative rule.  This post definitely is not a proof, but its results have been pretty compelling for my students over the years.

Sketching Derivatives of Exponentials:

At this point, my students also have experience sketching graphs of derivatives from given graphs of functions.  They know there are two basic graphical forms of exponential functions, and conclude that there must be two forms of their derivatives as suggested below.

When they sketch their first derivative of an exponential growth function, many begin to suspect that an exponential growth function might just be its own derivative.  Likewise, the derivative of an exponential decay function might be the opposite of the parent function.  The lack of scales on the graphs obviously keep these from being definitive conclusions, but the hypotheses are great first ideas.  We clearly need to firm things up quite a bit.

Numerically Computing Exponential Derivatives:

Starting with y=10^x, the students used their CASs to find numerical derivatives at 5 different x-values.  The x-values really don’t matter, and neither does the fact that there are five of them.  The calculators quickly compute the slopes at the selected x-values.

Each point on f(x)=10^x has a unique tangent line and therefore a unique derivative.  From their sketches above, my students are soundly convinced that all ordered pairs \left( x,f'(x) \right) form an exponential function.  They’re just not sure precisely which one. To get more specific, graph the points and compute an exponential regression.

So, the derivatives of f(x)=10^x are modeled by f'(x)\approx 2.3026\cdot 10^x.  Notice that the base of the derivative function is the same as its parent exponential, but the coefficient is different.  So the common student hypothesis is partially correct.

Now, repeat the process for several other exponential functions and be sure to include at least 1 or 2 exponential decay curves.  I’ll show images from two more below, but ultimately will include data from all exponential curves mentioned in my Scribd document at the end of the post.

The following shows that g(x)=5^x has derivative g'(x)\approx 1.6094\cdot 5^x.  Notice that the base again remains the same with a different coefficient.

OK, the derivative of h(x)=\left( \frac{1}{2} \right)^x causes a bit of a hiccup.  Why should I make this too easy?  <grin>

As all of its h'(x) values are negative, the semi-log regression at the core of an exponential regression is impossible.  But, I also teach my students regularly that If you don’t like the way a problem appears, CHANGE IT!  Reflecting these data over the x-axis creates a standard exponential decay which can be regressed.

From this, they can conclude that  h'(x)\approx -0.69315\cdot \left( \frac{1}{2} \right)^x.

So, every derivative of an exponential function appears to be another exponential function whose base is the same as its parent function with a unique coefficient.  Obviously, the value of the coefficient depends on the base of the corresponding parent function.  Therefore, each derivative’s coefficient is a function of the base of its parent function.  The next two shots show the values of all of the coefficients and a plot of the (base,coefficient) ordered pairs.

OK, if you recognize the patterns of your families of functions, that data pattern ought to look familiar–a logarithmic function.  Applying a logarithmic regression gives

For y=a+b\cdot ln(x), a\approx -0.0000067\approx 0 and b=1, giving coefficient(base) \approx ln(base).

Therefore, \frac{d}{dx} \left( b^x \right) = ln(b)\cdot b^x.

Again, this is not a formal mathematical proof, but the problem-solving approach typically keeps my students engaged until the end, and asking my students to  discover the derivative rule for exponential functions typically results in very few future errors when computing exponential derivatives.

Feedback on the approach is welcome.

Classroom Handout:

Here’s a link to a Scribd document written for my students who use TI-nSpire CASs.  There are a few additional questions at the end.  Hopefully this post and the document make it easy enough for you to adapt this to the technology needs of your classroom.  Enjoy.

Transformations II and a Pythagorean Surprise

In my last post, I showed how to determine an unknown matrix for most transformations in the xy-plane and suggested that they held even more information.

Given a pre-image set of points which can be connected to enclose one or more areas with either clockwise or counterclockwise orientation.  If a transformation T represented by matrix [T]= \left[ \begin{array}{cc} A & C \\ B & D \end{array}\right] is applied to the pre-image points, then the determinant of [T], det[T]=AD-BC, tells you two things about the image points.

  1. The area enclosed by similarly connecting the image points is \left| det[T] \right| times the area enclosed by the pre-image points, and
  2. The orientation of the image points is identical to that of the pre-image if det[T]>0, but is reversed if det[T]<0.  If det[T]=0, then the image area is 0 by the first property, and any question about orientation is moot.

In other words, det[T] is the area scaling factor from the pre-image to the image (addressing the second half of CCSSM Standard NV-M 12 on page 61 here), and the sign of det[T] indicates whether the pre-image and image have the same or opposite orientation, a property beyond the stated scope of the CCSSM.

Example 1: Interpret det[T] for the matrix representing a reflection over the x-axis, [T]=\left[ r_{x-axis} \right] =\left[ \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right].

From here, det[T]=-1.  The magnitude of this is 1, indicating that the area of an image of an object reflected over the line y=x is 1 times the area of the pre-image—an obviously true fact because reflections preserve area.

Also, det \left[ r_{x-axis} \right]<0 indicating that the orientation of the reflection image is reversed from that of its pre-image.  This, too, must be true because reflections reverse orientation.

Example 2: Interpret det[T] for the matrix representing a scale change that doubles x-coordinates and triples y-coordinates, [T]=\left[ S_{2,3} \right] =\left[ \begin{array}{cc} 2 & 0 \\ 0 & 3 \end{array} \right].

For this matrix, det[T]=+6, indicating that the image’s area is 6 times that of its pre-image area, while both the image and pre-image have the same orientation.  Both of these facts seem reasonable if you imagine a rectangle as a pre-image.  Doubling the base and tripling the height create a new rectangle whose area is six times larger.  As no flipping is done, orientation should remain the same.

Example 3 & a Pythagorean Surprise:  What should be true about  det[T] for the transformation matrix representing a generic rotation of \theta units around the origin,  [T]=\left[ R_\theta \right] = \left[ \begin{array}{cc} cos( \theta ) & -sin( \theta ) \\ sin( \theta ) & cos( \theta ) \end{array} \right] ?

Rotations preserve area without reversing orientation, so det\left[ R_\theta \right] should be +1.  Using this fact and computing the determinant gives

det \left[ R_\theta \right] = cos^2(\theta ) + sin^2(\theta )=+1 .

In a generic right triangle with hypotenuse C, leg A adjacent to acute angle \theta , and another leg B, this equation is equivalent to \left( \frac{A}{C} \right) ^2 + \left( \frac{B}{C} \right) ^2 = 1 , or A^2+B^2=C^2, the Pythagorean Theorem.  There are literally hundreds of proofs of this theorem, and I suspect this proof has been given sometime before, but I think this is a lovely derivation of that mathematical hallmark.

Conclusion:  While it seems that these two properties about the determinants of transformation matrices are indeed true for the examples shown, mathematicians hold out for a higher standard.   I’ll offer a proof of both properties in my next post.

Numerical Transformations, I

It’s been over a decade since I’ve taught a class where I’ve felt the freedom to really explore transformations with a strong matrix thread.  Whether due to curricular pressures, lack of time, or some other reason, I realized I had drifted away from some nice connections when I recently read Jonathan Dick’s and Maria Childrey’s Enhancing Understanding of Transformation Matrices in the April, 2012 Mathematics Teacher (abstract and complete article here).

Their approach was okay, but I was struck by the absence of a beautiful idea I believe I learned at a UCSMP conference in the early 1990s.  Further, today’s Common Core State Standards for Mathematics explicitly call for students to “Work with 2×2 matrices as transformations of the plane, and interpret the absolute value of the determinant in terms of area” (see Standard NV-M 12 on page 61 of the CCSSM here).  I’m going to take a couple posts to unpack this standard and describe the pretty connection I’ve unfortunately let slip out of my teaching.

What they almost said

At the end of the MT article, the authors performed a double transformation equivalent to reflecting the points (2,0), (3,-4), and (9,-7) over the line y=x via matrices using \left[ \begin{array}{cc} 0&1 \\ 1&0 \end{array} \right] \cdot  \left[ \begin{array}{ccc} 2 & 3 & 9 \\ 0 & -4 & -7 \end{array} \right] = \left[ \begin{array}{ccc} 0 & -4 & -7 \\ 2 & 3 & 9 \end{array} \right] giving image points (0,2), (-4,3), and (-7,9).  That this matrix multiplication reversed all of the points’ coordinates is compelling evidence that \left[ \begin{array}{cc} 0 & 1 \\ 1 & 0\end{array} \right] might be a y=x reflection matrix.

Going much deeper

Here’s how this works.  Assume a set of pre-image points, P, undergoes some transformation T to become image points, P’.  For this procedure, T can be almost any transformation except a translation–reflections, dilations, scale changes, rotations, etc.  Translations can be handled using augmentations of these transformation matrices, but that is another story.  Assuming P is a set of n two-dimensional points, then it can be written as a 2×n pre-image matrix, [P], with all of the x-coordinates in the top row and the corresponding y-coordinates in the second row.  Likewise, [P’] is a 2×n matrix of the image points, while [T] is a 2×2 matrix unique to the transformation. In matrix form, this relationship is written [T] \cdot [P] = [P'].

So what would \left[ \begin{array}{cc} 0 & -1 \\ 1 & 0\end{array} \right] do as a transformation matrix?  To see, transform (2,0), (3,-4), and (9,-7) using this new [T].

\left[ \begin{array}{cc} 0&-1 \\ 1&0 \end{array} \right] \cdot  \left[ \begin{array}{ccc} 2 & 3 & 9 \\ 0 & -4 & -7 \end{array} \right] = \left[ \begin{array}{ccc} 0 & 4 & 7 \\ 2 & 3 & 9 \end{array} \right]

The result might be more easily seen graphically with the points connected to form pre-image and image triangles.

After studying the graphic, hopefully you can see that \left[ \begin{array}{cc} 0 & -1 \\ 1 & 0\end{array} \right] rotated the pre-image points 90 degrees around the origin.


Now you know the effects of two different transformation matrices, but what if you wanted to perform a specific transformation and didn’t know the matrix to use.  If you’re new to transformations via matrices, you may be hoping for something much easier than the experimental approach used thus far.  If you can generalize for a moment, the result will be a stunningly simple way to determine the matrix for any transformation quickly and easily.

Assume you need to find a transformation matrix, [T]= \left[ \begin{array}{cc} a & c \\ b & d \end{array}\right] .  Pick (1,0) and (0,1) as your pre-image points.

\left[ \begin{array}{cc} a&c \\ b&d \end{array} \right] \cdot  \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] = \left[ \begin{array}{cc} a & c \\ b & d \end{array} \right]

On the surface, this says the image of (1,0) is (a,b) and the image of (0,1) is (c,d), but there is so much more here!

Because the pre-image matrix for (1,0) and (0,1) is the 2×2 identity matrix, [T]= \left[ \begin{array}{cc} a & c \\ b & d \end{array}\right] will always be BOTH the transformation matrix AND (much more importantly), the image matrix.  This is a major find.  It means that if you  know the images of (1,0) and (0,1) under some transformation T, then you automatically know the components of [T]!

For example, when reflecting over the x-axis, (1,0) is unchanged and (0,1) becomes (0,-1), making [T]= \left[ r_{x-axis} \right] = \left[ \begin{array}{cc} 1 & 0 \\ 0 & -1\end{array} \right] .  Remember, coordinates of points are always listed vertically.

Similarly, a scale change that doubles x-coordinates and triples the ys transforms (1,0) to (2,0) and (0,1) to (0,3), making [T]= \left[ S_{2,3} \right] = \left[ \begin{array}{cc} 2 & 0 \\ 0 & 3\end{array} \right] .

In a generic rotation of \theta around the origin, (1,0) becomes (cos(\theta ),sin(\theta )) and (0,1) becomes (-sin(\theta ),cos(\theta )).

Therefore, [T]= \left[ R_\theta \right] = \left[ \begin{array}{cc} cos(\theta ) & -sin(\theta ) \\ sin(\theta ) & cos(\theta ) \end{array} \right] .  Substituting \theta = 90^\circ into this [T] confirms the \left[ R_{90^\circ} \right] = \left[ \begin{array}{cc} 0 & -1 \\ 1 & 0\end{array} \right] matrix from earlier.

As nice as this is, there is even more beautiful meaning hidden within transformation matrices.  I’ll tackle some of that in my next post.