Jiro's pick this week is Reference Creator by Valerio Biscione.

Even though I have been away from the world of academia for quite some time, I can remember and appreciate the effort that people put into writing papers. And in that process, one of the more mundane tasks is putting together the references. Not only do I need to figure out the exact article and journal names, I need to put the information in the correct format. It may not be super difficult, but it's definitely time-consuming.

Valerio's Reference Creator makes this process extremely easy. It accesses Mendeley, an online research paper database, to gather the bibliographical data based on some search criteria, such as author name and year. You can use his tool either through the command line interface or the UI. I gave it a try using a paragraph from one of my papers.

It even gave me a warning when it found multiple matches with the same author(s) and year.

Thanks Valerio for this useful tool! I'm sure a lot of researchers out there will appreciate this.

*Warning: This entry uses some undocumented/unsupported capabilities, namely the displaying of colored and bold text in the
edit boxes. Use this entry with caution, as undocumented features can change in behavior or stop working at any time.*

**Comments**

Give this a try and let us know what you think here or leave a comment for Valerio.

Get
the MATLAB code

Published with MATLAB® R2014a

Since the video at the center of this post is about a rap battle between MATLAB and Simulink, I thought it would be interesting to bring that to a higher level and see how the video processing could have been implemented in Simulink instead of MATLAB.

Seriously... who wants to write code when you can simply connect blocks?

**Overview**

Since I have absolutely zero knowledge in image processing, I simply re-implemented the algorithm suggested by Matt in Simulink. The resulting model looks like:

Let's look at the details.

**Importing the video**

To begin, I used the From Multimedia File from the DSP System Toolbox to directly read the AVI-file with our rappers in front of a green background. Then I use the resize block from the Computer Vision System Toolbox. This allows me to work on images of a relatively small size. After, I normalize the data, as suggested by Matt.

**The Greenness factor**

I really like the simple equation the Matt figured out to calculate the "greenness" of each pixel in the image. In MATLAB, it looks like:

```
% Greenness = G*(G-R)*(G-B)
greenness = yd(:,:,2).*(yd(:,:,2)-yd(:,:,1)).*(yd(:,:,2)-yd(:,:,3));
```

**Thresholding**

The next piece of code is something that many users struggle to figure out:

`thresh = 0.3*mean(greenness(greenness>0));`

To implement logical indexing (e.g. `A(A>0)`) in Simulink, we need to combine the Find block with a Selector block. This results in a variable-size signal

**Combining the images**

The last line we need to implement is to replace the pixels identified as green in the foreground image but their values in the background image. In MATLAB, this is accomplished using this line:

`rgb2(isgreen) = rgb1(isgreen);`

In Simulink, we need to extend the logic seen at the previous step. We first select the desired pixels from the background image, and then assign them to the foreground image:

Note that the Computer Vision System Toolbox has a Compositing block implementing this functionality.

Here is what the final result looks like:

**Now it's your turn**

What do you think? Is it more convenient to implement this algorithm in MATLAB or Simulink? How do you typically decide between a *code-based* or a *block-based* implementation? Let us know by leaving a comment here.

At the party, we learned that Cleve is the winner of the 2014 IEEE John von Neumann Medal, which is awarded for outstanding achievements in computer-related science and technology.

Happy Birthday and Congratulations, Cleve! I've learned more just by working in the same hallway as you than in much of my formal education. Thanks for being so generous in sharing your knowledge and experience with so many of us for so many years.

Readers, be sure to check out Cleve's blog. Or listen to Cleve describe the three decades of mathematicians, computers, and technology (and a LINPACK license plate!) that led to the creation of MATLAB.

]]>For several years we thought Hadamard matrices showed maximum element growth for Gaussian elimination with complete pivoting. We were wrong.

I want to continue the discussion from my previous blog post of element growth during Gaussian elimination. Suppose we are solving a system of linear equations involving a matrix $A$ of order $n$. Let $a_{i,j}^{(k)}$ denote the elements in the matrix after the $k$ th step of the elimination. Recall that the *growth factor* for the pivoting process we are using is the quantity

$$ \rho_n = { \max_{i,j,k} |a_{i,j}^{(k)}| \over \max_{i,j} |a_{i,j}| } $$

Complete pivoting is intended to control the growth factor by using both row and column interchanges at each step in the reduction to bring the largest element in the entire unreduced matrix into the pivot position. In his 1962 analysis of roundoff error in Gaussian elimination, J. H. Wilkinson proved that the growth factor for complete pivoting satisfies

$$ \rho_n \le g(n) $$

where the *growth function* is

$$ g(n) = (n \ 2 \ 3^{1/2} 4^{1/3} \cdots n^{1/(n-1)})^{1/2} \approx 1.8 \ n^{1/2} n^{1/4 \log{n}} $$

Wilkinson's analysis makes it clear that the growth can never actually be anywhere near this large.

We saw that the growth factor for partial pivoting is $2^{n-1}$. For complete pivoting, it's much, much less. To get a feeling for now much, express $g(n)$ as a MATLAB one-liner

g = @(n) sqrt(n*prod((2:n).^(1./(1:(n-1)))))

g = @(n)sqrt(n*prod((2:n).^(1./(1:(n-1)))))

Check out $n = 100$

g(100)

ans = 3.5703e+03

So here $g(n) \approx 35n$.

Jacques Hadamard was a French mathematician who lived from 1865 until 1963. He made contributions in many fields, ranging from number theory to partial differential equations. The matrices named after him have entries +1 and -1 and mutually orthogonal rows and columns. The $n$ -dimensional parallelotope spanned by the rows of a Hadamard matrix has the largest possible volume among all parallelotopes spanned by matrices with entries bounded by one. We are interested in Hadamard matrices in the MATLAB world because they are the basis for Hadamard transforms, which are closely related to Fourier transforms. They are also applicable to error correcting codes and in statistics to estimation of variance.

So MATLAB already has a `hadamard` function. Here is the 8-by-8 Hadamard matrix.

H = hadamard(8)

H = 1 1 1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 1 1 1 1 -1 -1 -1 -1 1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 1 1 -1

Check that the rows are perpendicular to each other.

H'*H

ans = 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8

The orthogonality of the rows leads to element growth during Gaussian elimination. If we call for the LU factorization of `H`, no pivoting actually takes places, but the same result would be produced by complete pivoting that settles ties in favor of the incumbent.

[L,U] = lu(H)

L = 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 U = 1 1 1 1 1 1 1 1 0 -2 0 -2 0 -2 0 -2 0 0 -2 -2 0 0 -2 -2 0 0 0 4 0 0 0 4 0 0 0 0 -2 -2 -2 -2 0 0 0 0 0 4 0 4 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 -8

This, then, is one matrix at $n = 8$ where

$$ \rho_n = n $$

For almost 30 years everybody believed that Hadamard matrices were the extreme cases and that the growth factor $\rho_n$ for complete pivoting with real matrices was equal to $n$.

At various times in the 1960s and '70s I personally did numerical experiments looking for matrices with large pivot growth. The computers I had at the time limited the order of my matrices to a few dozen. I never found any cases of $\rho_n > n$.

One of the people who worked on the problem was Leonard Tornheim at Chevron Research in California. He also did a lot of numerical experiments. He found a complex 3-by-3 matrix with $\rho_3 > 3$. He also proved that for real matrices, $\rho_3 = 2 \ 1/4$.

In 1968, Colin Cryer made an official conjecture that $\rho_n$ was equal to $n$ for real matrices.

A 1988 paper by Jane Day and Brian Peterson provides a good survey of what was known up to that time about element growth with complete pivoting.

Then, in 1991, Nick Gould, from Oxford, surprised us all by announcing the discovery of real matrices for which the complete pivoting $\rho_n$ is greater than $n$. For an $n$ -by- $n$ matrix, with $n$ between 13 and 16, Nick set up a large, sparse, nonlinear optimization problem involving roughly $n^3/3$ variables, and tackled it with the LANCELOT package that he and his colleagues Andy Conn and Philippe Toint had developed. Most of his paper is about a 13-by-13 matrix with

$$ \rho_{13} = 13.0205 $$

This just barely exceeds the $\rho_n = n$ conjecture. But he also found some slightly larger matrice that do a better job.

$$ \rho_{14} = 14.5949 $$

$$ \rho_{15} = 16.1078 $$

$$ \rho_{16} = 18.0596 $$

The 16-by-16 is particularly dramatic because it is not a Hadamard matrix.

So, as far as I know, this is where the matter still stands today. Nobody is interested in running larger experiments. I have no idea what $\rho_{100}$ or $\rho_{1000}$ might be. Of course, we don't really need to know because we don't use complete pivoting in practice.

This reminds me of a story that I like to tell. I'm not sure it's actually true.

Peter Businger was a grad student at Stanford in Math and Computer Science at the same time I was in the early '60s. He worked with Gene Golub to write the first published Algol procedure for the QR matrix factorization using Householder reflections. Peter left Stanford to join Joe Traub's group at Bell Labs that was producing what they called the Port library of mathematical software. They were importing programs from everybody else, running the routines on Bell Labs' machines, thoroughly testing the software, deciding which codes were best, and producing a consolidated library.

Peter was in charge of matrix computation. He checked out all the imported linear equation solvers, including mine. He decided none of them was satisfactory, because of this pivoting growth question. So he wrote a program of his own that did partial pivoting, and monitor the growth. If the growth was too large, the program switched to complete pivoting.

At the time, Bell Labs was trying to patent software, so Peter named his new subroutine "P^8". This stood for "Peter's Pseudo Perfect Perfect Partial Pivot Picker, Patent Pending". In my opinion, that is one of the great all time subroutine names.

The experience inspired Peter to go to law school at night and change jobs. He switched to the Bell Labs patent department.

I've lost contact with Peter, so I can't ask him if he really did name the routine P^8. If he didn't, he should have.

Les Foster, who I mentioned in the last post for finding examples of exponential growth with partial pivoting, promotes what he calls "Rook Pivoting". Search down the column for the largest element, as in partial pivoting, but then also search along that row for the last element. This is intermediate between partial and complete pivoting. Les is able to prove some results about pivot growth. But the algorithm does not generalize well to a block form.

I was present at a milestone in the history of Hadamard matrices. The orthogonality conditions readily imply that the order $n$ of a Hadamard matrix must be 1, 2, or a multiple of 4. But the question remains, does a Hadamard matrix of order $n = 4k$ exist for every $k$? This is an open question in number theory.

It is fairly easy to create Hadamard matrices of some sizes. A nifty way to generate a Hadamard matrix of order a power of two is the recursion.

H = 1; for n = 2, 4, 8, ... H = [H H H -H]; end

If `A` and `B` are Hadamard matrices, then so is

kron(A,B)

By 1961, with these and other similar constructions, it turned out that it was known how to construct Hadamard matrices for all orders $n \le 100$ that are multiples of four except $n = 92$. In 1961 I had a summer job at JPL, Caltech's Jet Propulsion Lab. My office was in a temporary trailer and my trailer mate was Len Baumert. Len proudly showed me a color graphic that he had just made and that he was proposing for the cover of *Scientific American*. It was a 92-by-92 matrix made up from 26-by-26 blocks of alternating light and dark cells representing +1 and -1. The graphic didn't make the cover of *Scientific American*, but I kept my copy for a long time.

Len had done a computation on JPL's machines that filled in the last value of $n$ less than 100. His paper with his colleagues Sol Golomb, a professor at USC, and Marshall Hall, Jr., a professor at Caltech, was published in the prestigious Bulletin of the AMS. It turns out that I had taken a course on difference sets, the mathematics involved in generating the matrix, from Hall at Caltech.

Here is a MATLAB picture of `Baumert92`. You can get the function from this link.

H = Baumert92; pcolor92(H);

Let's check that we get the expected pivot growth.

[L,U] = lu(H); unn = U(end,end)

unn = -92

This reference to the book by Trefethen and Bau should have been included in my previous post about partial pivoting. Lecture 22 has an explanation the stability of partial pivoting in practice.

Lloyd N. Trefethen and David Bau, III, *Numerical Linear Algebra*, <http://bookstore.siam.org/ot50>, SIAM, 1997, 362pp.

Peter Businger, "Monitoring the numerical stability of Gaussian elimination", <http://link.springer.com/article/10.1007/BF02165006>, Numerische Mathematik 16, 4, 1971, 360-361.

Jane Day and Brian Peterson, "Growth in Gaussian elimination", <http://www.jstor.org/stable/2322755>, American Mathematical Monthly, 95, 6, 1988, 489-513.

Nick Gould, "On growth in Gaussian elimination with complete pivoting", <http://www.numerical.rl.ac.uk/people/nimg/pubs/Goul91_simax.pdf> SIAM Journal on Matrix Analysis and Applications 12, 1991, 351-364.

Leslie Foster, "The growth factor and efficiency of Gaussian elimination with rook pivoting", <http://www.math.sjsu.edu/~foster/gerpwithcor.pdf>, Journal of Computational and Applied Mathematics, 86, 1997, 177-194.

Leslie Foster, "LURP: Gaussian elimination with rook pivoting", matlabcentral/fileexchange/37721-rook-pivoting, MATLAB Central File Exchange, 2012.

Leonard Baumert, S. W. Golomb, and Marshall Hall, Jr, "Discovery of an Hadamard Matrix of Order 92", <http://dx.doi.org/10.1090%2FS0002-9904-1962-10761-7>, Bulletin of the American Mathematical Society 68 (1962), 237-238.

Get
the MATLAB code

Published with MATLAB® R2014a

If you want the distinguished honor of a Will Pick of the Week, here are some easy guidelines to follow:

- Choose an interesting or fun topic.
- Package the submission so that it's easy to figure out how to use.
- Ensure that it actually works for standard user inputs.

I'll probably want to get a second opinion, but PianoTuner tells me that my keys are all falling flat. As to how it figures this out, the algorithm is about 80 lines of code embedded into the UI callback functions. It looks like there's a fast Fourier transform and a polynomial curve fit in there.

Incidentally, you can run the tool against non-piano instruments as well. I played a random song on my computer speakers and found that musicians with professional instruments do much better.

Let us know what you think here or leave a comment for John.

]]>

Here is my answer.

]]>

*I'd like to welcome guest blogger and ace MATLAB training content developer Matt Tearle for today's post. Thanks, Matt!*

What do you do if you don't have an image processing background and your boss asks you to produce a music video for a MATLAB-vs-Simulink rap battle? It's probably not a question you've given much thought to. I admit that I hadn't either. But that's the awkward situation I found myself in a little while back.

A MATLAB-vs-Simulink rap battle -- obviously I **had** to be a part of this project! There was just one significant problem: I'm not a video producer. Yes, I have a nice digital camera. I have some very basic editing software. And I certainly have some... "creative" ideas. But I don't have any software designed to do serious video editing.

In particular, as soon as the concept was explained to me, I decided that we needed to do a "green screen" video:

- We'd record our rappers busting out their rhymes in front of a green backdrop
- I'd make an entertaining background video
- Then we'd put the two together, using the background video in place of the green screen:

[This background may not seem particularly "entertaining", but if I showed the real one we used, lawyers might get involved... And nobody wants that.]

With no purpose-made software to achieve this, and only a few days before it was due (thanks, boss!), I turned to my default standby: MATLAB!

A video is just a sequence of images, so all I needed to do was to write MATLAB code to "green-screen" two images, then read in two videos and apply my algorithm frame-by-frame. A quick hunt around in the documentation, and I had the framework code ready to go:

% Open the input video files v1 = VideoReader('background.mp4'); v2 = VideoReader('foreground.mp4'); % Determine the number of frames for the final video nFrames = min(v1.NumberOfFrames,v2.NumberOfFrames); % Open a video file to write the result to vOut = VideoWriter('mixedvideo.mp4','MPEG-4'); vOut.FrameRate = 24; open(vOut); % Loop over all the frames for k = 1:nFrames % Get the kth frame of both inputs x = read(v1,k); y = read(v2,k); % Mix them together z = magicHappensHere(x,y); % TODO: Write this... % Write the result to file writeVideo(vOut,z) end % And we're done! close(vOut);

Now all I needed to do was write the magic function that would do the greenscreening...

Actually, I soon discovered that I needed to do a bit more preprocessing first. I had created the background video on my computer, but filmed the foreground on my digital camera. Of course the dimensions didn't match... No worries! That's what Image Processing Toolbox is for. I added a line of code to define the desired dimensions of the output video. Then `imresize` took care of the rest:

% Loop over all the frames for k = 1:nFrames % Get the kth frame of both inputs x = imresize(read(v1,k),outDims); y = imresize(read(v2,k),outDims); % Mix them together z = magicHappensHere(x,y); % Write the result to file writeVideo(vOut,z) end

Wow, this image processing stuff is **easy**! What's next, then?

The first thing to try, of course, when developing an algorithm is to see if one already exists. Sadly, searching the Image Processing Toolbox documentation didn't turn up a `greenscreen` function or anything I could see that would do the job. So it looks like I have to do this from scratch.

If I can identify the green pixels, then I can just replace those pixels in the foreground image with the corresponding pixels in the background image. So all of this really just boils down to: how do I find the green pixels? I know that a true-color image in MATLAB is an m-by-n-by-3 array, where each of the three m-by-n "planes" represent red, green, and blue intensity, respectively:

"Green" would, ideally, be [0, 255, 0] (in `uint8`). So a simple way to find the green pixels would be to do some kind of thresholding:

isgreen = (y(:,:,1) < 175) & (y(:,:,2) > 200) & (y(:,:,3) < 175);

This didn't work too badly, but neither was it great. One of the problems is the use of magic threshold values -- three of them, no less! Playing around with three magic parameters to try to find the right combination doesn't sound like fun to me. And besides, using absolute values like this might not be the best way to judge "greenness". If you look at the example image above, you can see that there's a fair amount of variation in the green background. (This is mainly because we were using an unevenly lit conference room that happened to have a green wall -- this was clearly a very professional production!)

At this point, I had to ask myself "what does it really mean to be 'green'?". As a human, I can easily distinguish between the various shades of green on the wall, despite the difference in brightness and shading. At first, this lead me into thinking about color spaces and how a color is a point in some 3-D space (RGB, CYM, HSV, etc.). Perhaps I could define "green" as "sufficiently close to [0 1 0] in RGB space". And so followed a few experiments with rounding, distance calculations, even playing with `rgb2ind`. But to no great success:

yd = round(double(y)/255); isgreen = (yd(:,:,1)==0) & (yd(:,:,2)==1) & (yd(:,:,3)==0);

As this image shows, the problem with our makeshift green screen was that it wasn't particularly green -- more a yellow-green, with significant variations in brightness.

Just as I was started to get worried that this wasn't going to work at all, I realized that one obvious characteristic of the green pixels is that the green value is significantly **greater** than the blue and red values. This is, of course, embedded in my simple thresholding code above, but I can do away with at least some of the magic constants by looking at **relative** values. So how about calculating a "greenness" value from the three color channels:

```
yd = double(y)/255;
% Greenness = G*(G-R)*(G-B)
greenness = yd(:,:,2).*(yd(:,:,2)-yd(:,:,1)).*(yd(:,:,2)-yd(:,:,3));
```

Now **that** looks very promising. If I threshold that, I should get what I'm looking for. But rather than use an absolute threshold value maybe I can use something based on the distribution of the greenness values. In particular, I can exploit one nice feature of the images: they have a lot of green pixels. The histogram of greenness values looks like this:

I tried a simple thresholding based on the average greenness value, ignoring the negative values:

thresh = 0.3*mean(greenness(greenness>0)); isgreen = greenness > thresh;

Success! OK, so there's one magic constant left in there, but it was pretty quick and easy to tune, and seemed to be quite robust to minor changes. I'm calling that a good result. Now that I know where the green pixels are, I just have to replace them with the corresponding pixels from the background.

% Start with the foreground (essentially preallocation) z = y; % Find the green pixels yd = double(y)/255; % Greenness = G*(G-R)*(G-B) greenness = yd(:,:,2).*(yd(:,:,2)-yd(:,:,1)).*(yd(:,:,2)-yd(:,:,3)); thresh = 0.3*mean(greenness(greenness>0)); isgreen = greenness > thresh; % Blend the images % Loop over the 3 color planes (RGB) for j = 1:3 rgb1 = x(:,:,j); % Extract the jth plane of the background rgb2 = y(:,:,j); % Extract the jth plane of the foreground % Replace the green pixels of the foreground with the background rgb2(isgreen) = rgb1(isgreen); % Put the combined image into the output z(:,:,j) = rgb2; end

I love using logical indexing -- it's probably my favorite construct in the MATLAB language. Here I'm using it to replace the green pixels of the foreground (`rgb2(isgreen)`) with the corresponding background pixels (`rgb1(isgreen)`).

At this point I had a video that would have sufficed, but there was a noticeable green outline around my rappers. (It's not necessarily apparent in a single still frame.) It was only Sunday morning by this point, which meant that I still had a few hours left to tinker. Could I find a way to "shave" a pixel or two from the edge of the black silhouette in the above image of `isgreen`? To do that, I'd first need to find that edge. I don't know much about image processing, but I do know that "edge detection" is a thing that image people do, so it's time for me to search the doc... Sure enough, there's an `edge` function:

```
outline = edge(isgreen,'roberts');
```

Easy! Now if I can thicken that edge and combine it with the original `isgreen`, I'll have the slightly larger greenscreen mask that I need. Two more Image Processing Toolbox functions help me thicken the edge:

```
se = strel('disk',1);
outline = imdilate(outline,se);
```

The edge variable `outline` is a logical array, so I can easily combine it with `isgreen`, to get my final result:

isgreen = isgreen | outline;

Put it all together and I have a pretty simple script:

% Open the input video files v1 = VideoReader('background.mp4'); v2 = VideoReader('foreground.mp4'); % Determine the number of frames for the final video nFrames = min(v1.NumberOfFrames,v2.NumberOfFrames); % Set the output dimensions outDims = [400 640]; % Open a video file to write the result to vOut = VideoWriter('mixedvideo.mp4','MPEG-4'); vOut.FrameRate = 24; open(vOut); % Loop over all the frames for k = 1:nFrames % Get the kth frame of both inputs x = imresize(read(v1,k),outDims); y = imresize(read(v2,k),outDims); % Mix them together z = y; % Preallocate space for the result % Find the green pixels in the foreground (y) yd = double(y)/255; % Greenness = G*(G-R)*(G-B) greenness = yd(:,:,2).*(yd(:,:,2)-yd(:,:,1)).*(yd(:,:,2)-yd(:,:,3)); % Threshold the greenness value thresh = 0.3*mean(greenness(greenness>0)); isgreen = greenness > thresh; % Thicken the outline to expand the greenscreen mask a little outline = edge(isgreen,'roberts'); se = strel('disk',1); outline = imdilate(outline,se); isgreen = isgreen | outline; % Blend the images % Loop over the 3 color planes (RGB) for j = 1:3 rgb1 = x(:,:,j); % Extract the jth plane of the background rgb2 = y(:,:,j); % Extract the jth plane of the foreground % Replace the green pixels of the foreground with the background rgb2(isgreen) = rgb1(isgreen); % Put the combined image into the output z(:,:,j) = rgb2; end % Write the result to file writeVideo(vOut,z) end % And we're done! close(vOut);

This is not the best greenscreen code ever written (although it **is** the best greenscreen code **I've** ever written). But the real point is that it shouldn't even exist -- I had no prior image processing knowledge, I was working with videos of different dimensions, the greenscreen was unevenly lit (and not even particularly green), and I had a weekend to make it happen. The fact that I was able to do this is one of the main reasons I love MATLAB (and, of course, the associated toolboxes): I was able to spend my time on the heart of **my algorithm**, not all the coding details. Different-sized images? Fixed with one function call (`imresize`). Need to find edges of a region? One function call (`edge`). Want to thicken that edge? Two function calls (`strel` and `imdilate`). The final script is less than 50 lines.

My time was spent wrestling with the core of the problem: how I was going to figure out what "green" looked like in my images. Because I could spend my time there, I was able to tinker with different ideas. This, to me, is what MATLAB is all about: rapid prototyping -- ideas becoming working code.

Having made the video and feeling pretty good about what I managed to pull together in a few hours of MATLAB, I later realized that I got lucky. The way I calculated greenness (`G*(G-R)*(G-B)`) was intended to give negative results for anything that didn't have more green than red or blue. But I simply overlooked the possibility of getting a high positive value if **both** `G-R` and `G-B` were negative. According to my formula, a light magenta -- e.g. `[1 0.6 1]` -- would be very green! Luckily for me, my source videos didn't include anything like that. A more robust approach would have been something like this:

```
yd = double(y)/255;
% Greenness = G*(G-R)*(G-B)
greenness = yd(:,:,2).*max(yd(:,:,2)-yd(:,:,1),0).*max(yd(:,:,2)-yd(:,:,3),0);
```

Now any pixel with more red or more blue than green will have a greenness of 0. Everything else works the same.

I think there's a nice moral in this postscript, as well. I love using MATLAB to solve a problem -- a real, immediate problem that I need to solve right now. I don't necessarily need the best solution, or the solution to the most general problem. The greenscreen algorithm I hacked together wasn't the best, but it worked for my application. If my image had light magenta in it, I would have discovered this bug and fixed it, but that didn't happen and didn't matter.

If I had approached this assignment like a professional software developer, I'd still be working on it. But as a MATLAB hacker, I was done in a weekend!

Get
the MATLAB code

Published with MATLAB® R2014a

**A Look into the Future?**

We’ve been publically developing a Simulink model of the Hyperloop for some time now. We recently looked into using MATLAB interfaces to visualize that journey in GoogleEarth. To create a virtual cockpit, we constructed a MATLAB figure combining GoogleEarth windows and several other native graphics capabilities. We think the resulting video is pretty cool.

To understand how we arrived at this vision of the future, here’s a look back at our past Hyperloop posts.

**Getting Started**

We started out with a conversational post about Elon Musk’s concept and outlining some of the interesting technical points. This post was criticized for not brining any value to the discussion. I guess we were simulating trolling.

To get things moving, we introduced different approaches to architecting a simulation model for such a system. Guy suggested splitting the model into a plant (physical elements) and a controller (functional software) as in conventional model-based design. I proposed a more distributed architecture with several functional components, each containing its own plant and controller (image below). Guy’s proposal included subsystem variants while mine leveraged model reference.

**Initial dynamic analysis**

Last November, we published a two-dimensional analysis of the Hyperloop’s proposed route. We used the original proposal’s design constraints for curve radius to see how the track could be laid out over the California terrain.

The results highlighted the difficulties in maintaining tolerable g-levels for passengers while traveling at such high speeds along existing infrastructure. However, the analysis did not include several important factors; elevation and banking. Elevation promised to make route planning more difficult, while the effects of banking the vehicle in the tube would help reduce the g-forces and potentially improve the route.

*Anticipated hyperloop route (yellow) to avoid excessive g-forces (Image created using Google Earth)*

**Promoting Collaboration**

For Valentine’s Day, we used Simulink Projects to package a Simulink model on the MATLAB File Exchange that contained elements of each of our architectural approaches. My partitioning remained but now utilized Guy’s idea of using subsystem variants.

In April, R2014a was released with Simulink Projects supporting Git as a version control system. So, we quickly jumped on the bandwagon and placed an updated Hyperloop model on GitHub.

**Improved dynamics**

In May, we investigated the elevation profile of the route. We used optimization techniques to find the best combination of pillars and tunnels to meet cost and comfort demands. There were some great improvement suggestions in the comments of this post that we’ve yet to implement.

Last month, we enhanced the model to include the effects of banking within the tube. We blogged about a simple approach in SimMechanics to model those dynamics.

**Roll your own!**

In keeping with our collaborative mindset, the necessary models and m-scripts to generate this video are now available on the GitHub submission.

The project has even been set up to support new routes. Load your own KML file and see how the Hyperloop looks in your neighborhood.

]]>Have you ever needed to move a figure? Perhaps because it is offscreen>? Or at least the menus or close box are not visible? This happens to me from time to time when I get code from someone else who is programming specifically to the size of their display. Working on a laptop, I often have fewer available pixels. What to do?

I used the Function Browser to search for "move figure" and found `movegui`. It might be just the ticket for you.

As you can see from the documentation, `movegui` gives you a fair amount of latitude for moving your figure around. The two choices I find most useful and easiest to remember for my purposes: `onscreen` and `center`. I think you can guess what they each do :-)

Do you use the Function Browser to help you? Do you ever need to reposition "lost" figure windows, like I have? Let me know here.

Get
the MATLAB code

Published with MATLAB® R2014a

Sean's pick this week is ThingSpeak Support for MATLAB by MathWorks' Internet of Things Team.

Up until recently, I'd heard the phrase the "internet of things" thrown around only a few times and didn't really know what it meant. So I spoke with our Marketing Manager for the Maker Movement, Paul Kassebaum, to get a better understanding. Paul describes it briefly as "The confluence of technologies in communication (the internet as we know it), the energy internet (or energy from sensors, etc.) and the logistics internet (the ability to use this information for logistics)."

This week's pick uses all three. It allows you to pull data directly into MATLAB from ThingSpeak, a platform and cloud service where data can be uploaded from sensors and low cost hardware. After pulling this data, you can use it for whatever research, design or logistics you want.

I saw this blog post on our Maker Zone blog that described measuring soda consumption here at MathWorks. They posted the file on the FEX as well.

This made a challenge for me: Could I gather a few people together, drink as much Powerade as possible at 4pm Eastern Time on August 5th, and create an outlier data point?

We had a few detractors, like Loren, who did not properly appreciate the health benefits of this type of social gathering.

To passerbyers, it probably looked like we were part of some obscure MATLAB cult. Let's see how we did:

% Grab the data for July 6th through August 6th. [drinkData, timeStamps] = thingSpeakFetch(7040,'DateRange',{'06-July-2014','06-August-2014'}); Powerade = drinkData(:,3); % Third channel is Powerade % Index into the party hour idxParty = hour(timeStamps)==16 & day(timeStamps) == 5 & month(timeStamps)==8;

scatter(timeStamps(idxParty),Powerade(idxParty),400,'MarkerFaceColor',[0.03 0.6 0.8],'MarkerEdgeColor',[0.03 0.6 0.8]); hold on scatter(timeStamps,Powerade,15,'MarkerFaceColor',[0.5 0 0],'MarkerEdgeColor',[0.5 0 0]); datetick('x','mmmdd','keepticks','keeplimits'); legend('Powerade Party','Powerade Consumption','location','northwest')

So it does look like we were able to create a Powerade Consumption outlier. Go Team!

Give it a try and let us know what you think here or leave a comment for our Internet of Things team.

Get
the MATLAB code

Published with MATLAB® R2014a

Our latest update has two features that should be of interest to blog readers:

- Searching with Categories: Blog authors can tag their posts with Categories; keywords, or phrases that organize or describe a particular post. It’s a helpful tool for finding related content. Like a particular post? Just click the Category link at the bottom to find similar posts.
- Print a Post: Several blog readers requested this feature. Now you can easily print out posts by your favorite bloggers and read them when you’re off the grid. After spending all day staring into a monitor, there’s nothing better than being outside, reading a nicely formatted post on paper.Bliss.

We’re always looking to improve the blogs. Tell us what you think about these new features or let us know what we can do to make the blogs section better for you. While you’re thinking about all this, drop us a line in the Comments section below.

Thanks!

]]>Of course, my answer was that there are many ways to hold a value in Simulink. I made him a few examples that I am sharing today in this post.

**The Desired Behavior**

I want to create a subsystem with two inputs, `u` and `hold`, and one output `y`. When `hold` is true, the output `y` is equal to the input `u`. When `hold` is zero, the output `y` remains constant, holding the last output value.

Here are example input and output signals:

Let's see a few ways to obtain such behavior

**Method 1: Switch and Delay**

The most common way to hold a value that I observe in customers models is using a Switch and a Unit Delay, or Memory block

Nice, clean and simple!

**Method 2: Enabled Subsystem**

Personally, my favorite way to hold a the value of a signal is using an Enabled Subsystem, with the Outport property `Output when disabled` set to **held**.

I like this method because it takes virtually no blocks.

**Method 3: If Action Subsystems and Delay**

This version is similar to the first method, but we replace the Switch by an arrangement of If Action Subsystems and a Merge block.

Personally, I don't use this syntax very often, mostly because it requires more blocks.

**Method 4: Stateflow**

Another easy way to obtain the behavior we are looking for is using Stateflow

**What about the generated code?**

When choosing an implementation, many factors need to be taken into account. One of my favorite guidelines for was suggested by a reader of this blog: **Clearly show the intended functionality**. I will let you judge by yourself which implementation most clearly shows the intention of holding a signal.

Another concern that many users have is: What does the code look like?

In this case, all the methods generate similar code. to make the code easier to read for this post, I set the storage class of the three signals or interest (`hold`, `u`, `y`) to ExportedGlobal.

Switch and Delay:

****Note: To obtain the above code, I enabled inline parameters from the model configuration. Otherwise the threshold of the switch is a tunable parameter.*

Enabled Subsystem:

If Action Subsystem:

****Note: To obtain the above code, I manually set the Function Packaging option of the If Action Subsystem to Inline. Otherwise, since the two subsystems are identical, the coder generates a reusable function.*

Stafelow:

**Now it's your turn**

What is your preferred method to hold the value of a signal? Are there other functionalities for which you consider multiple implementations? Let us know by leaving a comment here.

]]>In rare cases, Gaussian elimination with partial pivoting is unstable. But the situations are so unlikely that we continue to use the algorithm as the foundation for our matrix computations.

I almost hesitate to bring this up. Gaussian elimination with partial pivoting is potentially unstable. We know of a particular test matrix, and have known about it for years, where the solution to simultaneous linear equations computed by our iconic backslash operator is less accurate than we typically expect. The solution is contaminated by unacceptably large roundoff errors associated with large elements encountered during the elimination process. The offending matrix is generated by the following function, which is a special case of the `gfpp` function in Nick Higham's collection of MATLAB test matrices.

```
type gfpp
```

function A = gfpp(n) % A = gfpp(n) Pivot growth of partial pivoting. A = eye(n,n) - tril(ones(n,n),-1); A(:,n) = 1;

Here is the 8-by-8.

n = 8 A = gfpp(n)

n = 8 A = 1 0 0 0 0 0 0 1 -1 1 0 0 0 0 0 1 -1 -1 1 0 0 0 0 1 -1 -1 -1 1 0 0 0 1 -1 -1 -1 -1 1 0 0 1 -1 -1 -1 -1 -1 1 0 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 -1 -1 1

Gaussian elimination with partial pivoting does not actually do any pivoting with this particular matrix. The first row is added to each of the other rows to introduce zeroes in the first column. This produces twos in the last column. As similar steps are repeated to create an upper triangular `U`, elements in the last column double with each step. The element growth in the final triangular factor `U` is `2^(n-1)`.

[L,U] = lu(A); U

U = 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 2 0 0 1 0 0 0 0 4 0 0 0 1 0 0 0 8 0 0 0 0 1 0 0 16 0 0 0 0 0 1 0 32 0 0 0 0 0 0 1 64 0 0 0 0 0 0 0 128

So if the matrix order `n` is set to the number of bits in the floating point word (plus one), the last diagonal element of `U` will grow to be `1/eps`.

n = 53 A = gfpp(n); [L,U] = lu(A); unn = U(n,n)

n = 53 unn = 4.5036e+15

If we try to use backslash to solve a linear system involving this matrix and a random right hand side, the back substitution will begin with `U(n,n)`. The roundoff errors can be expected to be as large as `eps(U(n,n))` and swamp the solution itself.

b = randn(n,1); x = A\b;

Ordinarily, we expect the solution computed by Gaussian elimination to have a residual on the order of roundoff error. But in this case, the residual is many orders of magnitude larger. Backslash has failed.

r = b - A*x; relative_residual = norm(r)/(norm(A)*norm(x))

relative_residual = 3.7579e-05

Anything we do to this matrix to provoke pivoting during the elimination will prevent the growth of the elements in `U`. One possibility is to swap the first two rows.

A = gfpp(n); A([1 2],:) = A([2 1],:); [L,U] = lu(A); unn = U(n,n)

unn = 2

That's much better. Now use backslash and look at the residual.

x = A\b; r = b - A*x; relative_residual = norm(r)/(norm(A)*norm(x))

relative_residual = 1.7761e-17

So the residual is now on the order of roundoff relative to `A` and `x`. Backslash works as expected.

Another way to provoke pivoting is to perturb the matrix with a little white noise.

A = gfpp(n); A = A + randn(n,n); [L,U] = lu(A); unn = U(n,n)

unn = -3.8699

So `U` is well behaved. Let's see about backslash.

x = A\b; r = b - A*x; relative_residual = norm(r)/(norm(A)*norm(x))

relative_residual = 1.2870e-16

Again the residual is on the order of roundoff and backslash works as expected.

Suppose we are using Gaussian elimination with any pivoting process to solve a system of linear equations involving a matrix $A$. Let $a_{i,j}^{(k)}$ denote the elements in the matrix after the $k$ th step of the elimination. The *growth factor* for that pivoting process is the quantity

$$ \rho = { \max_{i,j,k} |a_{i,j}^{(k)}| \over \max_{i,j} |a_{i,j}| } $$

This growth factor is a crucial quantity in Wilkinson's roundoff error analysis. If $\rho$ is not too large, then the elimination algorithm is stable. Unfortunately, the matrix we have just seen, `gfpp`, shows that the growth factor for partial pivoting is at least $2^{n-1}$.

At the time he was presenting his analysis in early 1970s, Wilkinson knew about this matrix. But he said it was a very special case. He reported:

It is our experience that any substantial increase in size of elements of successive An is extremely uncommon even with partial pivoting. ... No example which has arisen naturally has in my experience given an increase as large as 16.

During the development of LINPACK in the late 1970s, a colleague at Argonne, J. T. Goodman, and I did some experiments with random matrices to check for element growth with partial pivoting. We used several different element distributions and what were then matrices of modest order, namely values of $n$ between 10 and 50. The largest growth factor we found was for a matrix of 0's and 1's of order 40. The value was $\rho$ = 23. We never saw growth as large as linear, $\rho = n$, let alone exponential, $\rho = 2^{n-1}$.

Nick Trefethen and Rob Schreiber published an important paper, "Average-case Stability of Gaussian Elimination", in 1990. They present both theoretical and probabilistic models, and the results of extensive experiments. They show that for many different types of matrices, the typical growth factor for partial pivoting does not increase exponentially with matrix order $n$. It does not even grow linearly. It actually grows like $n^{2/3}$. Their final section, "Conclusions", begins:

Is Gaussian elimination with partial pivoting stable on average? Everything we know on the subject indicates that the answer is emphatically yes, and that one needs no hypotheses beyond statistical properties to account for the success of this algorithm during nearly half a century of digital computation.

The Higham Brothers, Nick and Des Higham, published a paper in 1989 about growth factors in Gaussian elimination. Most of their paper applies to any kind of pivoting and involves growth that is $O(n)$. But they do have one theorem that applies only to partial pivoting and that characterizes all matrices that have $\rho = 2^{n-1}$. Essentially these matrices have to have the behavior we have seen in `gfpp`, namely no actual pivoting and each step adds a row to all the later rows, doubling the values in the last column in the process.

In two different papers, Stephen Wright in 1993 and Les Foster in 1994, the authors present classes of matrices encountered in actual computational practice for which Gaussian elimination with partial pivoting blows up with exponential growth in the elements in the last few columns of `U`. Wright's application is a two-point boundary value problem for an ordinary differential equation. The discrete approximation produces a band matrix. Foster's application is a Volterra integral equation model for population dynamics. The discrete approximation produces a lower trapezoidal matrix. In both cases the matrices are augmented by several final columns. Certain choices of parameters causes the elimination to proceed with no pivoting and results in exponential in growth in the final columns. The behavior is similar to `gfpp` and the worst case growth in the Highams theorem.

Complete pivoting requires $O(n^3)$ comparisons instead of the $O(n^2)$ required by partial pivoting. More importantly, the repeated access to the entire matrix complete changes the memory access patterns. With modern computer storage hierarchy, including cache and virtual memory, this is an all-important consideration.

The growth factor $\rho$ with complete pivoting is interesting and I plan to investigate it in my next blog.

`lugui` is one of the demonstration programs included with Numerical Computing with MATLAB. I will also describe it in my next blog. For now, I will just use it to generate a graphic for this blog. Here is the result of a factorization, the lower triangular factor `L` in green and the upper triangular factor `U` in blue.

```
A = gfpp(6);
lugui(A,'partial')
```

Nicholas J. Higham and Desmond J. Higham, "Large Growth Factors in Gaussian Elimination with Pivoting", <http://www.maths.manchester.ac.uk/~higham/papers/hihi89.pdf>, SIAM J. Matrix Anal. Appl. 10, 155-164, 1989.

Lloyd N. Trefethen and Robert S. Schreiber, "Average-case Stability of Gaussian Elimination", https://courses.engr.illinois.edu/cs591mh/trefethen/trefethen_schreiber.pdf, SIAM J. Matrix Anal. Appl. 11, 335-360, 1990.

Stephen J. Wright, A Collection of Problems for Which Gaussian "Elimination with Partial Pivoting is Unstable", <ftp://info.mcs.anl.gov/pub/tech_reports/reports/P266.ps.Z>, SIAM J. Sci. Statist. Comput. 14, 231-238, 1993.

Leslie Foster, "Gaussian Elimination Can Fail In Practice", <http://www.math.sjsu.edu/~foster/geppfail.pdf>, SIAM J. Matrix Anal. Appl. 15, 1354-1362, 1994.

Get
the MATLAB code

Published with MATLAB® R2014a

Sean's pick this week is Par Tic Toc by Sarah Wait Zaranek.

I run a fair number of computationally expensive operations and often use parallel for (parfor) loops to speed them up. This utility interested me mainly because it shows how work is distributed across the workers and provides worker utilization rates.

First, I will make sure a parallel pool is open using `gcp()` or "Get Current Pool". This opens the headless MATLAB processes that a parfor loop distributes iterations to.

pool = gcp

pool = Pool with properties: Connected: true NumWorkers: 2 Cluster: local AttachedFiles: {} IdleTimeout: 90 minute(s) (89 minutes remaining) SpmdEnabled: true

We'll do a simple example with a loop that has a random pause. The pause is a proxy for an operation that takes a varying and unpredicable amount of time.

n = 50; p = Par(n); % Build the Par timer parfor ii = 1:n Par.tic; % start timer pause(randi(5)) % Pause for up to five seconds p(ii) = Par.toc; % measure elapsed time end stop(p); % all done plot(p);

The parfor loop has done a good job distributing the calculations between the two workers. The more highly utilized worker only has one additional iteration that the other.

There are a few other neat tricks in here as well that I found from looking through the code.

- If you right click on the axes with each iteration's time, it gives you context menu to select the scale.
- Since the Par class inherits from handle, you get those visible methods. Sarah hides those from
`methods()`by making their signature hidden. - Sarah has also provided a full set of documentation on the capabilities of this utiltiy as well as examples.

Give it a try and let us know what you think here or leave a comment for Sarah.

Get
the MATLAB code

Published with MATLAB® R2014a

Today's blog post comes from the stream of image processing questions over at MATLAB Answers.

User arjun wanted to know how to remove circles from an image by specifying their center and radius.

Famous MATLAB Answer man Image Analyst quickly gave a helpful reply.

I thought it might be worth a little discussion here (plus, I wanted to point out the value of MATLAB Answers.)

Typically, the process of answering the question started out with trying to understand the question better. "What does *remove* mean to you?" asked Image Analyst. The answer came back, "I want to make the pixels in the circle be black."

OK, let's give that a try.

```
I = imread('coins.png');
imshow(I)
```

(Hmm. Guess I didn't have any quarters in my pocket when I took this picture.)

Suppose we want to set to black all the pixels centered at (X = 175, Y = 120) and with radius 35.

I like Image Analyst's recommended procedure:

- Compute a
*mask image*. Generally, the term*mask*or*mask image*means a binary image whose foreground pixels correspond to a set of pixels of interest. - Use logical indexing with the mask image to modify the desired pixels.

Start with creating a mask image.

```
xc = 175;
yc = 120;
r = 33;
x = 1:size(I,2);
y = 1:size(I,1);
[xx,yy] = meshgrid(x,y);
mask = hypot(xx - xc, yy - yc) <= r;
imshow(mask)
title('Mask image')
```

Next, use logical indexing to modify the pixels of `I` corresponding to the foreground pixels of `mask`.

```
I2 = I;
I2(mask) = 0;
imshow(I2)
title('Image with some pixels set to black')
```

And we just lost 5 cents.

Image Analyst pointed out that you can use this technique to set pixel values in the circle to any value you want. How about white:

```
I2(mask) = 255;
imshow(I2)
title('Image with some pixels set to white')
```

For fun, how about setting the pixels to the background color. How do we determine that? I might start by thresholding the original image.

```
bw = im2bw(I,graythresh(I));
imshow(bw)
title('Thresholded image')
```

Now fill in the holes.

bw = imfill(bw,'holes'); imshow(bw) title('Thresholded image with holes filled')

Compute the mean value of the background pixels (using logical indexing again!).

bg_mean = mean(I(~bw))

bg_mean = 66.9959

Replace the pixels in the circle with the computed average background value.

```
I2(mask) = bg_mean;
imshow(I2)
title('Circle pixels filled with average background')
```

Let me point out two related Image Processing Toolbox functions. The `roifill` function can fill a region based on a mask; it fills pixels "smoothly" from the adjacent background pixels.

```
I3 = roifill(I,mask);
imshow(I3)
title('Circle pixels replaced using roifill')
```

I also want to mention `imellipse`. You can use this function to draw circles interactively using the mouse. Then you can use the `createMask` function to get the mask image corresponding to the circle you drew.

Happy circle filling!

Get
the MATLAB code

Published with MATLAB® R2014a

Here is the result:

**Why Simulink Code Inspector?**

You are designing a high-integrity software application using Simulink (good idea!). Once you have an excellent design that has been tested and meets the design requirements, you use Embedded Coder to generate C code that you will then compile, link, and load to your embedded hardware target.

You know your Simulink design does exactly what you want it to do - nothing more, nothing less. But you are not going to be running Simulink on the embedded hardware - you will be running an executable built from the automatically generated C code.

So what about that C code - does it still represent your exact design? How can you tell? You could use Processor-in-the-Loop (PIL) and re-run your requirements-based tests on the target processor to verify that the outputs match your simulation outputs. That's a good start. However, this is not necessarily going to prove the absence of unintended code. So, what can?

**So... what is Simulink Code Inspector?**

Simulink Code Inspector automatically compares generated code with its source model. It examines the generated code and the model to determine if they are structurally equivalent. After inspection is complete, it produces detailed reports with model-to-code and code-to-model traceability analysis.

Because Simulink Code Inspector has been implemented completely independently of Embedded Coder, it can be used as a means of compliance for DO-178C/DO-331 certification objectives. These structural equivalence and traceability reports can be submit to certification authorities as evidence of code reviews for high-integrity standards such as DO-178C. Simulink Code Inspector is supported by our DO-178C/DO-331 Qualification Kit, allowing you to obtain certification credit when using it for DO-178C/DO-331 applications.

**How does it do that?**

Ok, magic and abstract syntax trees.

**Working with Simulink Code Inspector**

To highlight what Code Inspector can do for you, let's take a very simple model to be deployed to an embedded target.

Not all Simulink/Stateflow features are supported by Code Inspector. Before running an inspection, you need to check that your model is compatible with Code Inspector. A set of Model Advisor checks, included with Code Inspector, can be run to verify compatibility.

I run these checks on the model and review the Model Advisor report.

No compatibility issues, so let's go ahead with inspection.

Open the Simulink Code Inspector window from the 'Code' menu...

And get this window:

Once code generation and inspection are complete, a report with the results will open.

**Code Inspection Report**

Here's the report from the inspection - we passed!

The report is extremely detailed - let's look at some of the information it includes.

**Model-To-Code Traceability**

Model-to-code traceability - with hyperlinks for navigating to the exact model object for convenience.

**Code-To-Model Traceability**

Depending on your goal, it might be more convenient to know which block corresponds to a specific line of code. The report also includes a code-to-model traceability section.

**I guess it's your turn now...**

We've introduced some of the basic features of Simulink Code Inspector. Obviously there's quite a bit more to this tool, but I hope that this overview will inspire you to think about your processes for verifying autogenerated code for high-integrity applications.

Does your organization's development process for high-integrity embedded software include manual code reviews or manual tracing from model to code? Are you interested in seeing more posts about high-integrity software development with Simulink? Let us know what you think by leaving a comment below.

One more thing... as mentioned in a previous post, if you need help with any part of your Model-Based Design process, you can contact our Consulting Servies.

]]>

Photo by Sara Kerens

Nick is what you might call a “heroic user” of MATLAB. He was there at the first class in which Cleve introduced the original FORTRAN MATLAB as a teaching tool. Near the lobby of the main MathWorks building here in Natick, we have a framed copy of the very first purchase order for MATLAB. Who bought that first copy of MATLAB? None other than Nick Trefethen. And it’s not merely a question of precedence or longevity. Across his long career, Nick has been a contributor of the highest order to the ecosystem of mathematical code written in MATLAB. He is the author of many books, an influential educator, and most recently he has been leading an innovative open source software project, a project that grew out of work he initiated back in 2002: Chebfun.

I can personally attest that the examples associated with Chebfun are among the most beautiful and thoughtful you will find documenting any mathematical project on the web. We recently conducted a virtual interview with Nick to learn about the latest release of Chebfun. Here’s what we learned.

Q: When did you first start using MATLAB?

A: I took a course from Cleve Moler when I was a graduate student at Stanford, so I must have typed x = A\b for the first time around 1978. When MathWorks started selling MATLAB, I bought the very first copy! That was in February, 1985. It was about ten years later when MathWorks told me I had been the first.

Q: What is Chebfun?

A: Chebfun is a system written in MATLAB that looks and feels like MATLAB, except that vectors and matrices become functions and operators. Whenever MATLAB has a discrete command, Chebfun aims to have the same command overloaded to execute a continuous analogue. For example, SUM(f) gives you the integral of a chebfun f over its interval of definition, MAX(f) gives you its maximum, ROOTS computes the roots, and so on.

Q: Why is it called Chebfun?

A: Well, Chebfun stands for Chebyshev-functions. Our algorithms are based on representations of functions by piecewise Chebyshev polynomial interpolants. The details are given in my book Approximation Theory and Approximation Practice.

Q: This Chebyshev fellow must have been an interesting character. Can you tell us a little more about how his name came to be associated with your favorite functions?

A: Pafnuty Lvovich Chebyshev was the preeminent Russian mathematician of the 19th century, born near Moscow and a professor in St. Petersburg. He was remarkable for contributing in a big way to both pure and applied mathematics. The crucial thing for Chebfun was that he laid the foundations of how one can approximate nonperiodic functions by polynomials. What Fourier is to periodic functions, Chebyshev is to nonperiodic functions.

Q: A new version of Chebfun has recently been released. What is different in this release?

A: The big thing is that the code has been completely rewritten in a more modular structure and released on GitHub; we are a fully public open-source project. There’s also a beautiful new web site at www.chebfun.org, with a user’s guide and hundreds of examples showing how to use the code. Besides that there are many speedups and a number of new features, notably the capability of taking advantage of periodicity of functions (i.e., Fourier as opposed to Chebyshev methods). And the two-dimensional side of Chebfun, for representing functions f(x,y), keeps advancing.

Q: Why have you released Chebfun on GitHub?

A: We’d like to encourage new users and new developers. Broadening the Chebfun community seems the best way to make the software develop as strongly as possible and ensure its long-term future. (We have thousands of users already, but we don’t know as much about them as we would like.)

Q: What benefit do you see in the MATLAB File Exchange / GitHub integration?

A: So far as we are aware, Chebfun is the most advanced project written in MATLAB. It’s completely natural that it should be easily available from the File Exchange. We are very pleased at this new development.

Q: How might MATLAB users apply Chebfun?

A: We think every MATLAB user is potentially a Chebfun user, for every MATLAB user works with functions. It’s so much simpler and more natural to call SUM, MAX, ROOTS and so on than to worry about the calling sequences of QUAD, FMINBND, FZERO. In 2D, the same commands find integrals and extrema and roots of functions f(x,y). Chebfun is also the easiest way to solve ODE boundary value problems. We’ve overloaded x=A\b to u=N\f, where N is a differential operator with boundary conditions and f is a right-hand side.

The big advantage in all these computations is that we’re always dealing with functions (chebfuns), they all take the same form, and the user doesn’t have to know anything about the underlying grids. Input to SUM or MAX or ROOTS — just a chebfun. Output from \ for ODEs — just another chebfun. This is very different from the traditional style of computing in MATLAB and other languages with codes like BVP4C/BVP5C where for each code you need to learn a calling sequence that may involve specification of an initial grid and the code’s special structures like “DEVAL” for representing output functions.

Q: How can MATLAB users access Chebfun?

A: Just go to the File Exchange, or google Chebfun. Be sure to check out the People page with its photo-bios of Chebfun developers including Nick Hale, Toby Driscoll, Asgeir Birkisson, Alex Townsend, Anthony Austin, Grady Wright, and Hrothgar.

]]>Guest blogger Peter Webb returns with another in an occasional series of postings about application deployment.

I've written a MATLAB application that reports on the presence of contaminants in surface water, using a publically available database of water quality observations maintained by the United States Geological Survey and the Environmental Protection Agency. Many groups of people might be interested in this kind of information. MATLAB made it easy to write the application, and some tools we've added recently make sharing the application easy as well.

I can share the application with other MATLAB users as a MATLAB App. With MATLAB Compiler I can create a standalone program that will run on a Windows, Linux or Mac desktop. And Builder JA lets me turn my application into a Java class for use by other developers.

To follow the rest of this article, please download the example code from MATLAB Central.

You could share your code with other MATLAB users by emailing them your function and data files. But then you'd have to explain how to install, update and uninstall your application -- and the process would be very vulnerable to human error. Or you could create a MATLAB App. Introduced in R2012b, MATLAB Apps are self-contained programs that automate a technical computing task, typically via a graphical user interface. MATLAB displays the installed apps in the App Gallery, which you access from the **Apps** tab in the MATLAB toolstrip. Packaging your code as a MATLAB App creates an automated installer (a `.mlappinstall` file) that integrates your application into the App Gallery. The MATLAB App installer includes only your code; if the App uses any toolboxes, the recepient of the App must have licened copies of these toolboxes in order to use the App. Click on the **Package App** button on the **Apps** tab to begin packaging your program as an app.

The packaging interface consists of three columns, each representing a step in the packaging process:

- Pick main file
- Describe your app
- Package into installation file

All Apps must have a main file -- this is the MATLAB file that starts the graphical user interface or performs the task. Click on the `Add main file` link:

From the downloaded example code, select `swq.m`. The **Files included through analysis** section fills in with the required function files. Since the automated analysis cannot discover data files, you must add them manually. Select the `Add files/folders` in the **Shared resources and helper files** section and add `Contaminants.xml`, `ContaminantLimits.xml` and `swqLogo.png`.

All Apps must also have a name. Type `SurfaceWaterQuality` into the `App Name` field at the top of the **Describe your app** area. You can add your contact info and a description too, if you like, but that's optional. If you want to add an icon and a screenshot, I've provided appropriate images: `sqwIcon.png` for the icon and `swqApp.png` for the screenshot. Click on the screenshot and icon to browse for these images.

Once you're satisfied with the description, press the **Package** button to create the installer: `SurfaceWaterQuality.mlappinstaller`. On any machine with MATLAB installed, double-clicking `SurfaceWaterQuality.mlappinstaller` will install the `SurfaceWaterQuality` app into the MATLAB App Gallery. Click once on the app to start it:

With these inputs, the app produces a report of contaminants recorded within five miles of the MathWorks buildings in Natick, Massachusetts.

MATLAB Apps allow you to share programs with other MATLAB users. But what if your target audience doesn't have MATLAB? MATLAB Compiler packages MATLAB programs to run against the MATLAB Compiler Runtime, which does not require MATLAB. The process of creating a standalone executable with MATLAB Compiler begins on the **Apps** tab of the MATLAB toolstrip. Click the **Application Compiler** button:

Like an App, a MATLAB Compiler-generated standalone executable requires a main file. Click on the `+` next to the `Add main file` field and select `waterQualityReport.m` from the downloaded example files:

`applicationCompiler` derives the application name from the name of the main file and immediately begins determining the files required to run the application. The **Files required for your application to run** section fills with MATLAB function dependencies. You'll need to add data file dependencies manually. Click on the `+` button in that section and add `Contaminants.xml` and `ContaminantLimits.xml`:

Like a MATLAB App, a standalone application may contain contact and description information to be displayed in the installer. You may also add a splash screen, for example `waterQuality.jpg` from the downloaded example files. `applicationCompiler` automatically includes the splash screen image in the installer, so there's no need to add it manually. Once you're done entering this optional information, change the name of the installer to `WQRInstaller_web`.

Press the **Package** button to build the installer for your standalone application.

Packaging creates a platform-specific installer: `waterQualityReport\for_redistribution\WQRInstaller_web.exe` on Windows and `waterQualityReport/for_redistribution/WQRInstaller_web` on Linux and Mac. Copy this file to a machine of the appropriate type, run the installer, and you'll be able to install and run the water quality application without installing MATLAB. Run the installer like this:

`WQRInstaller_web`

Look for the executable in the `application` folder, located in the installation folder. And then run the application like this:

`waterQualityReport 42.2973025 -71.3525913 5 1/1/2004`

On Linux and Mac platforms, you may find it easier to run the application using the generated `run_waterQualityReport.sh` shell script. This command assumes you've installed the MCR in the `/tmp/WQTest/MATLAB_Compiler_Runtime/v83` folder:

./run_waterQualityReport.sh \ /tmp/WQTest/MATLAB_Compiler_Runtime/v83/ \ 42.2973025 -71.3525913 5 1/1/2004

See the MATLAB Compiler documentation for more information about how to run a MATLAB Compiler-generated application.

Builder JA creates a Java component that you integrate into a larger host application. Before using Builder JA, you must configure your environment to ensure Builder JA uses a supported version of the Java compiler.

When you have set up your environment, type `libraryCompiler` at the MATLAB prompt and choose **Java Package** as the application type to begin:

Instead of running a main file, a Java component exports one or more functions into a public API. Use the `+` button to add `detectContamination.m` to the **Exported Functions** list. Then, in **Files required for your application to run** section, add the data files `Contaminants.xml`, `ContaminantLimits.xml` and `swqLogo.png`.

A Java API requires a class with at least one method. Placing the class in a namespace helps prevent class name conflicts. `libraryCompiler` uses generic defaults: no namespace, a class named `Class1` and method names corresponding to the MATLAB functions you chose to export. Change the class name to `SurfaceWaterQuality` and the namespace to `WaterQuality`.

Change the name of the generated installer (as illustrated previously, in **Deploying to the Desktop**) to `DCInstaller_web`. Press the **Package** button to create `WaterQuality.jar`.

Unlike the desktop application, the Java component in `WaterQuality.jar` won't run by itself. It needs to be invoked from a host Java application. The downloaded code contains an example main program, `WaterQuality.java`.

Much like the MATLAB App and the desktop application, this Java host program collects a search area and start date from the user and invokes the web service. In addition to creating a user interface via AWT, the main program must perform three tasks to interact with the Builder JA-generated Java component:

- Import Builder JA generated and utility classes.
- Create a
`WaterQuality.SurfaceWaterQuality`object. - Invoke the
`detectContaminants`method.

The `WaterQuality` namespace contains the generated `SurfaceWaterQuality` class and the `com.mathworks.toolbox.javabuilder` namespace contains Builder JA runtime management and error handling classes such as `MWException`. The Java program imports these namespaces:

import WaterQuality.*; import com.mathworks.toolbox.javabuilder.*;

Builder JA creates object, rather than static, methods, so you must create an instance of the generated class to invoke the methods in your exported API. The constructor might throw an `MWException`, so your Java code must either `catch` `MWException` or declare it in a `throws` clause. The unsophisticated code in the example catches (and ignores) errors that might occur during object creation:

try { swq = new SurfaceWaterQuality(); } catch (MWException me) {}

Pushing the **Report** button invokes the web service via the `detectContamination` method. The first input is the number of expected outputs. Methods in the exported API return an array of Java `Object` instances. The host program converts each result to the appropriate native type, a `String` in this case, before using them. Builder JA includes a robust set of type-conversion routines to make this task easy. `detectContamination` preformats the text for tabular display with a monospaced font such as `Courier`. In the Java code below, `reportData` is a Java `TextArea` object capable of scrolling to display the list of reported contaminants.

Object[] lhs; lhs = swq.detectContamination(1, latitude, longitude, radius, startDate); if (lhs != null) { String result = lhs[0].toString(); reportData.setText(result); }

The download contains scripts to help you build and run the sample application. See the `Readme.txt` for platform-specific instructions. Note that the application does not do much validation of its inputs, and is somewhat fragile as a result.

Managing the interaction with the web service is the most complex part of these three water quality report applications. And in each of them, that part is exactly the same. Only the target-specific wrapper differs, adapting the core functionality to the interface norms of the target environment: MATLAB, the standalone desktop and a Java host program.

MATLAB Apps, `applicationCompiler` and `libraryCompiler` use the same visual conventions and very similar three-step workflows: select functions to deploy, add data files and descriptive data, create an installer. This shared language and process makes it simpler to deploy the same code to multiple target environments.

Do you deploy code to more than one environment? What challenges do you face? Let us know here.

Get
the MATLAB code

Published with MATLAB® R2014a

Jiro's pick this week is `legappend` by Chad Greene.

Chad is no stranger to MATLAB Central. He has over 50 File Exchange entries, and two of his entries have been highlighted (unit converters and ccc) in Pick of the Week. His entries are well-written, and like this one, many of his entries have published example files.

Many of you may know that the command `legend` creates one legend per axes.

t = 0:.1:10; x1 = sin(t); x2 = cos(t); plot(t,x1,t,x2) legend('Sine','Cosine')

Let's say that you wanted to add another line to this plot.

x3 = sin(t).*cos(t); hold on plot(t,x3,'--r')

To add this entry to the legend, you would re-run the `legend` command with the three entries.

legend('Sine','Cosine','Sine*Cosine')

Chad's `legappend` allows you to append new entries to an existing legend. This means that you can simply call it along with the new lines you
create.

% New figure figure; plot(t,x1,t,x2) legend('Sine','Cosine') % Add another line hold on plot(t,x3,'--r') % Append a new legend item legappend('Sine*Cosine')

Great! You can also delete the last legend entry with the following command.

`legappend('')`

If you want to see more examples, check out his published example.

Chad explains that `legappend` simply deletes the current legend and recreates the updated legend. This is a good example of how you can take a normal mode
of operation and tweak it in order to adapt it to your needs.

Give it a try and let us know what you think here or leave a comment for Chad.

Get
the MATLAB code

Published with MATLAB® R2014a

*Today's post is part of an ongoing (but long delayed) tutorial series on digital image processing using MATLAB. I'm covering topics in roughly the order used in the book Digital Image Processing Using MATLAB.*

In the previous post in this series, I discussed the different numeric data types that commonly come into play when doing image processing in MATLAB. Today I want to talk about the four main *image types*:

- Gray-scale images
- Binary images
- Indexed images
- RGB images

A *gray-scale image* is a matrix whose values represent shades of gray. When the matrix is of type `uint8`, the integer-valued elements are in the range [0,255]. By convention, the value 0 is displayed as black, and the value 255 is displayed as white. Values in-between are displayed as intermediate shades of gray.

When the matrix is of type `uint16`, then 0 is displayed as black and 65535 is displayed as white.

For a floating-point matrix, either of type `double` or `single`, the value 1.0 is displayed as white.

Originally, the Image Processing Toolbox documentation called these *intensity images*, and you might still find this term used in some places. Some of our color scientist users complained, though, that the term *intensity image* meant something slightly different in their field, so we (mostly) changed our terminology.

I = imread('rice.png'); whos I

Name Size Bytes Class Attributes I 256x256 65536 uint8

imshow(I)

In image processing, the term *binary image* refers to a two-valued image whose pixes are either black or white. (Or, a bit more generally, the pixels are either background or foreground.) In MATLAB and the Image Processing Toolbox, we have adopted the convention that binary images are represented as *logical matrices*. Here's an example of constructing a matrix whose type is logical and then displaying the result as a binary (black-and-white) image.

```
[xx,yy] = meshgrid(-100:100);
bw = hypot(xx,yy) <= 50;
whos bw
```

Name Size Bytes Class Attributes bw 201x201 40401 logical

```
imshow(bw)
title('Binary image')
```

Note that a common mistake is to create a `uint8` matrix and fill it with 0s and 1s, thinking that it will display as black and white.

bw = zeros(30,30,'uint8'); bw(10:20,10:20) = 1; imshow(bw,'InitialMagnification','fit') title('Help, my image is all black!')

When a matrix is of type `uint8`, then the value 1 is not white; it's almost completely black!

An *indexed image* has two components: an *index matrix* of integers, commonly called `X`, and a `color map matrix`, commonly called `map`. The matrix `map` is an M-by-3 matrix of floating-point values (either `double` or `single`) in the range [0.0,1.0]. Each row of `map` specifies the red, green, and blue components of a single color. An indexed image is displayed by mapping values in the index matrix to colors in the color map. A quirk of MATLAB is that this mapping is data-type specific. If the index matrix is floating-point, then the value 1.0 corresponds to the first color in the color map. But if the index matrix is `uint8` or `uint16`, then the value 0 corresponds to the first color. (Maybe I'll explain the reasons for that on another day.)

You display an indexed image by passing both the index matrix and the color map matrix to `imshow`, like this:

[X,map] = imread('trees.tif'); whos X map

Name Size Bytes Class Attributes X 258x350 90300 uint8 map 256x3 6144 double

```
imshow(X,map)
title('Indexed image')
```

An RGB image is an M-by-N-by-3 array. For a particular pixel at row `r` and column `c`, the three values `RGB(r,c,1)`, `RGB(r,c,2)`, and `RGB(r,c,3)` specify the red, green, and blue color components of that pixel. A pixel whose color components are [0,0,0] is displayed as black. For a floating-point array, a pixel whose color components are [1.0,1.0,1.0] is displayed as white. For a `uint8` or `uint16` array, either [255,255,255] or [65535,65535,65535] is displayed as white.

RGB = imread('peppers.png'); whos RGB

Name Size Bytes Class Attributes RGB 384x512x3 589824 uint8

imshow(RGB)

You can think of an RGB image as a "stack" of three gray-scale images. These gray-scale images are commonly called the *component images*.

```
imshow(RGB(:,:,1))
title('Red component image')
```

```
imshow(RGB(:,:,2))
title('Green component image')
```

```
imshow(RGB(:,:,3))
title('Blue component image')
```

In my first year of blogging (2006), I wrote a series of blog posts explaining in great detail how the MATLAB image display model works for various image and data types. The series is still up-to-date and worth reading today. Or, to see it all in one place, take a look at my MATLAB Digest article, "How MATLAB Represents Pixel Colors."

For more information, see Sections 2.7 and 7.1 of *Digital Image Processing Using MATLAB*.

Get
the MATLAB code

Published with MATLAB® R2014a

**Active State Output**

Did you know that with Stateflow, it is possible to monitor which state is currently active, and use that information in the rest of the Simulink model? This feature is named Active State Output. The screenshot below shows how to do this for one of the charts in the example Temporal Logic Using the AT Function.

The checkbox creates a new output port on your chart block, whose value (in simulation) is of an enumerated type with values that match the names of the child states.

When doing this for a chart with parallel decomposition, Active State Output needs to be enabled for each parallel state individually, and each parallel state has a separate corresponding output port. *(The sf_car demonstration uses this to output the current gear selected by the shift logic to its transmission model)*.

We can also take advantage of this feature in order to accomplish something special in the code generated from Stateflow charts.

**Stateflow States in Generated Code (Default Behavior)**

When using Embedded Coder to generate code from Stateflow charts, by default the states of the chart are created as macros using the `#define` directive. For example, here is a snippet from the generated code from the chart shown above.

```
/* Named constants for Chart: '<Root>/Temporal Logic' */
#define sf_tlat_IN_NO_ACTIVE_CHILD ((uint8_T)0U)
#define sf_tlat_IN_State ((uint8_T)1U)
/* Named constants for Chart: '<Root>/Without Temporal Logic' */
#define sf_tlat_IN_A ((uint8_T)1U)
#define sf_tlat_IN_B ((uint8_T)2U)
#define sf_tlat_IN_C ((uint8_T)3U)
```

The macros are grouped together by chart under an identifying comment line, but there is no intrinsic distinction as to which chart they belong to.

**Stateflow States as Enums**

So what if you would like to have a more organized grouping? Specifically, what if you would like to have your states be members of an enumeration?

As you might have guessed, we need to enable Active State Output. But we need to do two more things in order to get this enumeration in a nice, clean, standalone form. First, open the signal properties for the signal coming from the Active State Output port and give the signal a name. Second, set the signal’s storage class (under the Code Generation tab) to ExportedGlobal.

Now, in the *_types.h generated code file, there will be an enumeration definition, like so:

```
typedef enum {
listofStates_None = 0, /* Default value */
listofStates_A,
listofStates_B,
listofStates_C
} listofStates;
```

And in the generated c code for the model, macros for the chart are no longer defined; rather, a global (with the name of the signal) is declared of type listOfStates:

```
/* Exported block signals */
listofStates currentState; /* '<Root>/Without Temporal Logic' */
```

Further down in that c code file, you will find a switch-case statement that implements the logic of the Without Temporal Logic chart, like so:

```
switch (currentState) {
case listofStates_A:
...
break;
case listofStates_B:
...
break;
default:
...
break;
```

**One Last Note**

When you enable Active State Output, you can also check the “Define enumerated type manually” box and click on the helpful hyperlink to automatically create a template m-file that defines the enumerated type.

At the top of this file you will find an enumeration declaration with a list of all the state names. Tempting as it may be, you cannot change these names here unless you also change them in the chart itself. However, you can use this file to change some other behaviors using the methods that follow (but that’s a blog post for another day).

**Now it's your turn**

Are you already using the active state output? Give this a try and let us know what you think by leaving a comment here.

]]>Denormal floating point numbers and gradual underflow are an underappreciated feature of the IEEE floating point standard. Double precision denormals are so tiny that they are rarely numerically significant, but single precision denormals can be in the range where they affect some otherwise unremarkable computations. Historically, gradual underflow proved to be very controversial during the committee deliberations that developed the standard.

My previous post was mostly about normalized floating point numbers. Recall that normalized numbers can be expressed as

$$ x = \pm (1 + f) \cdot 2^e $$

The *fraction* or *mantissa* $f$ satisfies

$$ 0 \leq f < 1 $$

$f$ must be representable in binary using at most 52 bits for double precision and 23 bits for single precision.

The exponent $e$ is an integer in the interval

$$ -e_{max} < e \leq e_{max} $$

where $e_{max} = 1023$ for double precision and $e_{max} = 127$ for single precision.

The finiteness of $f$ is a limitation on *precision*. The finiteness of $e$ is a limitation on *range*. Any numbers that don't meet these limitations must be approximated by ones that do.

Double precision floating point numbers are stored in a 64-bit word, with 52 bits for $f$, 11 bits for $e$, and 1 bit for the sign of the number. The sign of $e$ is accommodated by storing $e+e_{max}$, which is between $1$ and $2^{11}-2$.

Single precision floating point numbers are stored in a 32-bit word, with 23 bits for $f$, 8 bits for $e$, and 1 bit for the sign of the number. The sign of $e$ is accommodated by storing $e+e_{max}$, which is between $1$ and $2^{8}-2$.

The two extreme values of the exponent field, all zeroes and all ones, are special cases. All zeroes signifies a denormal floating point number, the subject of today's post. All ones, together with a zero fraction, denotes infinity, or `Inf`. And all ones, together with a a nonzero fraction, denotes Not-A-Number, or `NaN`.

My program `floatgui`, available here, shows the distribution of the positive numbers in a model floating point system with variable parameters. The parameter $t$ specifies the number of bits used to store $f$, so $2^t f$ is an integer. The parameters $e_{min}$ and $e_{max}$ specify the range of the exponent.

If you look carefully at the output from `floatgui` shown in the previous post you will see a gap around zero. This is especially apparent in the logarithmic plot, because the logarithmic distribution can never reach zero.

Here is output for slightly different parameters, $t = 3$, $e_{min} = -5$, and $e_{max} = 2$. Howrever, the gap around zero has been filled in with a spot of green. Those are the denormals.

Zoom in on the portion of these toy floating point numbers less than one-half. Now you can see the individual green denormals -- there are eight of them in this case.

Denormal floating point numbers are essentially roundoff errors in normalized numbers near the underflow limit, `realmin`, which is $2^{-e_{max}+1}$. They are equally spaced, with a spacing of `eps*realmin`. Zero is naturally included as the smallest denormal.

Suppose that `x` and `y` are two distinct floating point numbers near to, but larger than, `realmin`. It may well be that their difference, `x - y`, is smaller than `realmin`. For example, in the small `floatgui` system pictured above, `eps = 1/8` and `realmin = 1/32`. The quantities `x = 6/128` and `y = 5/128` are between `1/32` and `1/16`, so they are both above underflow. But `x - y = 1/128` underflows to produce one of the green denormals.

Before the IEEE standard, and even on today's systems that do not comply with the standard, underflows would simply be set to zero. So it would be possible to have the MATLAB expression

x == y

be false, while the expression

x - y == 0

be true.

On machines where underflow flushes to zero and division by zero is fatal, this code fragment can produce a division by zero and crash.

if x ~= y z = 1/(x-y); end

Of course, denormals can also be produced by multiplications and divisions that produce a result between `eps*realmin` and `realmin`. In decimal these ranges are

format compact format short e [eps*realmin realmin] [eps('single')*realmin('single') realmin('single')]

ans = 4.9407e-324 2.2251e-308 ans = 1.4013e-45 1.1755e-38

Denormal floating point numbers are stored without the implicit leading one bit,

$$ x = \pm f \cdot 2^{-emax+1}$$

The *fraction* $f$ satisfies

$$ 0 \leq f < 1 $$

And $f$ is represented in binary using 52 bits for double precision and 23 bits for single recision. Note that zero naturally occurs as a denormal.

When you look at a double precision denormal with `format hex` the situation is fairly clear. The rightmost 13 hex characters are the 52 bits of the fraction. The leading bit is the sign. The other 12 bits of the first three hex characters are all zero because they represent the biased exponent, which is zero because $emax$ and the exponent bias were chosen to complement each other. Here are the two largest and two smallest nonzero double precision denormals.

format hex [(1-eps); (1-2*eps); 2*eps; eps; 0]*realmin format long e ans

ans = 000fffffffffffff 000ffffffffffffe 0000000000000002 0000000000000001 0000000000000000 ans = 2.225073858507201e-308 2.225073858507200e-308 9.881312916824931e-324 4.940656458412465e-324 0

The situation is slightly more complicated with single precision because 23 is not a multiple of four. The fraction and exponent fields of a single precision floating point number -- normal or denormal -- share the bits in the third character of the hex display; the biased exponent gets one bit and the first three bits of the 23 bit fraction get the other three. Here are the two largest and two smallest nonzero single precision denormals.

format hex e = eps('single'); r = realmin('single'); [(1-e); (1-2*e); 2*e; e; 0]*r format short e ans

ans = 007fffff 007ffffe 00000002 00000001 00000000 ans = 1.1755e-38 1.1755e-38 2.8026e-45 1.4013e-45 0

Think of the situation this way. The normalized floating point numbers immediately to the right of `realmin`, between `realmin` and `2*realmin`, are equally spaced with a spacing of `eps*realmin`. If just as many numbers, with just the same spacing, are placed to the left of `realmin`, they fill in the gap between there and zero. These are the denormals. They require a slightly different format to represent and slightly different hardware to process.

The IEEE Floating Point Committee was formed in 1977 in Silicon Valley. The participants included representatives of semiconductor manufacturers who were developing the chips that were to become the basis for the personal computers that are so familiar today. As I said in my previous post, the committee was a remarkable case of cooperation among competitors.

Velvel Kahan was the most prominent figure in the committee meetings. He was not only a professor of math and computer science from UC Berkeley, he was also a consultant to Intel and involved in the design of their math coprocessor, the 8087. Some of Velvel's students, not only from the campus, but also who had graduated and were now working for some of the participating companies, were involved.

A proposed standard, written by Kahan, one of his students at Berkeley, Jerome Coonen, and a visiting professor at Berkeley, Harold Stone, known as the KCS draft, reflected the Intel design and was the basis for much of the committee's work.

The committee met frequently, usually in the evening in conference rooms of companies on the San Francisco peninsula. There were also meetings in Austin, Texas, and on the East Coast. The meetings often lasted until well after midnight.

Membership was based on regular attendance. I was personally involved only when I was visiting Stanford, so I was not an official member. But I do remember telling a colleague from Sweden who was coming to the western United States for the first time that there were three sites that he had to be sure to see: the Grand Canyon, Las Vegas, and the IEEE Floating Point Committee.

The denormals in the KCS draft were something new to most of the committee. Velvel said he had experimented with them at the University of Toronto, but that was all. Standards efforts are intended to regularize existing practice, not introduce new designs. Besides, implementing denormals would reguire additional hardware, and additional transistors were a valuable resource in emerging designs. Some experts claimed that including denormals would slow down all floating point arithmetic.

A mathematician from DEC, Mary Payne, led the opposition to KCS. DEC wanted a looser standard that would embrace the floating point that was already available on their VAX. The VAX format was similar to, but not the same as, the KCS proposal. And it did not include denormals.

Discussion went on for a couple of years. Letters of support for KCS from Don Knuth and Jim Wilkinson did not settle the matter. Finally, DEC engaged G. W. (Pete) Stewart, from the University of Maryland. In what must have been a surprise to DEC, Pete also said he thought that the KCS proposal was a good idea. Eventually the entire committee voted to accept a revised version.

Denormal floating point numbers are still the unwanted step children in today's floating point family. I think it is fair to say that the numerical analysis community has failed to make a strong argument for their importance. It is true that they do make some floating point error analyses more elegant. But with magnitudes around $10^{-308}$ the double precision denormals are hardly ever numerically significant in actual computation. Only the single precision denormals around $10^{-38}$ are potentially important.

Outside of MATLAB itself, we encounter processors that have IEEE format for the floating point numbers, but do not conform to the 754 standard when it comes to processing. These processors usually flush underflow to zero and so we can expect different numerical results for any calculations that might ordinarily produce denormals.

We still see processors today that handle denormals with microcode or software. Execution time of MATLAB programs that encounter denormals can degrade significantly on such processors.

The Wikipedia page on denormals has the macros for setting trap handlers to flush underflows to zero in C or Java programs. I hate to think what might happen to MATLAB Mex files with such macros. Kids, don't try this at home.

William Kahan, Home Page.

Charles Severance, An Interview with the Old Man of Floating-Point, Reminiscences for *IEEE Computer*, February 20, 1998.

Wikipedia page, Denormal number.

Get
the MATLAB code

Published with MATLAB® R2014a

Today I looked around the File Exchange for recent image processing submissions or updates. (You can do this yourself by searching using the tag image processing.) Here are a few things that caught my eye.

getTimeStamp, by Christopher MacMinn, is an example of someone using the recently introduced linking between the File Exchange and GitHub. This is a straightforward function that illustrates how to use `imfinfo` to read useful metadata from a file. In particular, it shows how to extract the information that digital cameras stick into image files these days.

The critical part of the code looks something like this (simplified a bit):

info = imfinfo(filename); if isfield(info,'DigitalCamera') % There is digital camera information in the file. % Get the time the picture was taken (as a string). str = info.DigitalCamera.DateTimeOriginal; end

If you have data that originally came from ImageJ, you might find Dylan Muir's ReadImageJROI function to be useful. Dylan originally submitted this in 2011, and he recently updated it to fix a couple of issues. The file is regularly downloaded and has received several good ratings.

PIVlab is an impressive-looking application for partical image velocimetry created by William Thielicke. This contribution gets a LOT of traffic and has very high ratings. William has updated it regularly, and he has been responsive in the comment thread.

Sometimes the author of a research paper will submit the MATLAB implementation of the paper's method to the File Exchange. That appears to be the case with An improved computer vision method for white blood cells detection (using differential evolution) by Erik. I'm guessing that Erik is Erik Cuevas, the lead author of the paper quoted in the submission's description.

I would be very interested to hear your opinions about the most useful image processing submissions on the File Exchange. Please post your comments here.

Get
the MATLAB code

Published with MATLAB® R2014a

Greg's pick this week is module - encapsulate MATLAB package into a name space module by Daniel Dolan.

The concept of namespaces has been available in MATLAB for a number of years at this point. While the technology is very mature at this point, a common feature of namespaces was not included in the MATLAB implementation. You cannot call one function in the same namespace from another function in the same namespace without including the namespace definition somewhere. Most namespaces in other languages permit functions to call each other within the namespace without needing to identify the namespace.

Daniel’s “MODULE” function provides an interesting means to work around that constraint using a method that demonstrates the powerful concept of function handles in MATLAB.

The nature of Daniel’s solution also provides a means to create namespace specifiers, without needing to continually reference the full namespace.

Finally Daniel makes it possible to have what one might call “dynamic namespaces”.

I selected this submission because it demonstrates the powerful feature of function handles, advertises the use of namespaces in MATLAB, and uses the “DBSTACK” function in a way that surprised me.

Yes, it happens. There’s what appears to be a bug in this File Exchange submission. Change line *110* from

`root=sprintf('%s.',root{:});`

to

`root=sprintf('%s.',root{end:-1:1});`

The order in which package namespaces is being constructed is backward. Changing this line reverses the construction and resolves the issue.

If you’re not familiar with namespaces, don’t fret. It is an advanced software engineering concept. I work with a lot of software engineers all over the world, and many of them are not familiar with namespaces because they work in coding languages or environments that don’t have them (such as C).

A namespace provides a means to specifically call unique functions that share the same name. I like to think of a namespace as a context or a classifier for a function.

Let’s say you have two functions, each named “myDisp”, but they have different behavior.

`>> myDisp('Hello')`

My Display 1 Hello

Versus

`>> myDisp('Hello')`

My Display 2 ------------- Hello

How do you invoke the appropriate *myDisp* at the correct time?

Perhaps you could give the functions different names: *myDisp* and *myDisp2*.

But if you’re concerned about modularity, and these functions are used independently in other projects and code files, changing the name of the functions will be problematic.

What if instead you could include an identifier that would make the call to the *myDisp* function more specific?

>> myFirstNamespace.myDisp('Hello') >> myOtherNamespace.myDisp('Hello')

Now both functions can be available, without introducing a name conflict, or needing to change the name of the function.

To accomplish this in MATLAB, you need to use package folders

While namespaces provide a means to be more specific about function calls, they can be inconvenient when it comes to actually writing them out, and reading them in the code file. Especially as the namespaces get longer.

`>> myMainNamespace.myOtherSubNamespace.myOtherNamespace.myDisp('Hello')`

This seems hard to read, and a pain to write.

We can use the IMPORT function to handle this:

```
>> import myMainNamespace.mySubNamespace.myFirstNamespace.*
>> myDisp('hello')
```

My Display 1 Hello

Now contents of the myFirstNamespace can be referenced without using the full namespace notation. But if you import two functions with the same name but live in different namespaces, you are back to the same problem that namespaces try to solve! You can partially import namespaces, but this gets tedious.

The MODULE function makes it possible to refer to functions in namespace by essentially providing an alias for the namespace. This means you can refer to functions deep within a namespace without needing to write out the full namespace, or have the potential conflict issues using the IMPORT function.

>> myLocalSpace = module('myMainNamespace.mySubNamespace.myFirstNamespace'); >> myLocalSpace.myDisp('Hello')

>> myOtherLocalSpace = module('myMainNamespace.myOtherSubNamespace.myOtherNamespace'); >> myOtherLocalSpace.myDisp('Hello')

Here the calls to the functions in the namespace are concise and specific to the desired function.

This also allows you to do fun things like make the namespace dynamic

if (a < b) myLocalSpace = module('myMainNamespace.mySubNamespace.myFirstNamespace'); else myLocalSpace = module('myMainNamespace.myOtherSubNamespace.myOtherNamespace'); end myLocalSpace.myDisp('Hello')

I cannot say at this point whether or not having dynamic namespaces is a good idea, but it is certainly an interesting concept.

In MATLAB, if a function in a namespace calls another function in the same namespace, it must refer to the full namespace of the function being called.

function myDisp( displayInput ) displayHeader = myMainNamespace.myOtherSubNamespace.myOtherNamespace.otherFunction; disp(displayHeader); disp(displayInput); end

This is fine, until later when you wrap the existing namespace in another parent namespace, and you must go into all of the functions and append the new parent namespace. If you have hundreds of functions, this is going to get old fast.

Daniel provides a really neat mechanism to get around this constraint. Calling MODULE with no inputs assumes the namespace you want is relative to the current function.

function myDisp( displayInput ) localNamespace = module; displayHeader = localNamespace.otherFunction(); disp(displayHeader); disp(displayInput); end

Now, even if the structure of the namespace changes, you can still call the functions local to the namespace, because the
local namespace is automatically updated when the *myDisp* function is called.

This is the part I think is really elegant.

In order to make this work, Daniel takes advantage of function handles. Function handles are pointers or aliases to functions. This means you can use a variable to store a reference to a function.

Daniel creates a collection of these function handles by creating function handles for each of the functions in a specific package folder. Then he stores these function handles in a MATLAB struct. Subreferencing the struct using the dot notation (struct.field) will now act like referencing a function in a namespace. How cool is that!

Finally, to get around the constraint MATLAB imposes of functions in a namespace referencing other functions in the same namespace, Daniel uses the DBSTACK function.

I’m pretty sure he’s using DBSTACK in a way it was never intended, but I think it’s absolutely brilliant!

Daniel uses the DBSTACK function to determine what function called MODULE, so he can then figure out what namespace the function belongs to.

While there may be some performance implications from using DBSTACK, it is an excellent demonstration of the flexibility of the MATLAB environment. You can ask a function “Who called you?” I’ve been working with MATLAB over 15 years now, and I didn’t realize you could do that!

If you would like to leave any comments regarding this post, please click here.

Get
the MATLAB code

Published with MATLAB® R2014a

Today I set out to write a post called "Radon transform - under the hood."

Things didn't turn out the way I expected.

When I considered how to write the post I originally intended, I started thinking about how it would be nice to have a good way to visualize (and therefore help understand) the Radon transform itself. So I started working on that instead.

Of course, I didn't have a lot of time today, so I didn't get anywhere near finished with a working app. But I thought I would post anyway!

I'm inviting you, dear reader, to weigh in about what you think would make a good Radon transform visualizer.

To get you started, here's my sketch:

On the left, you can see a simple image (light square on dark background), and you can see the Radon projection along one particular angle (45 degrees).

On the right, you can see the entire Radon transform, displayed as an image. You can also see a vertical dashed line at 45 degrees, which corresponds to the projection angle shown on the left.

I imagine you could interactively drag either the projection axis on the left or the vertical line on the right to change the projection angle and watch the projection change.

So, what do you think? Good idea, bad idea? Any suggestions?

Get
the MATLAB code

Published with MATLAB® R2014a

Collecting and tracking health and fitness data with wearable devices is about to go mainstream as the smartphone giants like Apple, Google and Samsung jump into the fray. But if you collect data, what's the point if you don't analyze it?

Today's guest blogger, Toshi Takeuchi, would like to share an analysis of a weight lifting dataset he found in a public repository.

- Motivation, dataset, and prediction accuracy
- Data preprocessing and exploratory analysis
- Predictive Modeling with Random Forest
- Plot misclassification errors by number of trees
- Variable Importance
- Evaluate trade-off with ROC plot
- The reduced model with 12 features
- Conclusion and the next steps - integrate your code into your app

The Human Activity Recognition (HAR) Weight Lifting Exercise Dataset provides measurements to determine "how well an activity was performed". 6 subjects performed 1 set of 10 Unilateral Dumbbell Biceps Curl in 5 different ways.

When I came across this dataset, I immediately thought of building a mobile app to advise end users whether they are performing the exercise correctly, and if not, which common mistakes they are making. I used the powerful 'Random Forest' algorithm to see if I could build a successful predictive model to enable such an app. I was able to achieve **99% prediction accuracy** with this dataset and I would like to share my results with you.

The dataset provides 39,242 samples with 159 variables labeled with 5 types of activity to detect - 1 correct method and 4 common mistakes:

- exactly according to the specification (Class A)
- throwing the elbows to the front (Class B)
- lifting the dumbbell only halfway (Class C)
- lowering the dumbbell only halfway (Class D)
- throwing the hips to the front (Class E)

Sensors were placed on the subjects' belts, armbands, glove and dumbbells, as described below:

**Citation** *Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.* Read more: http://groupware.les.inf.puc-rio.br/har#ixzz34dpS6oks

Usually you cannot use raw data directly. Preprocessing is an important part of your analysis workflow that has significant downstream impact.

- Load the dataset and inspect data for missing values
- Partition the dataset for cross validation
- Clean and normalize variables
- Select predictor variables (features)

Among those steps, cross validation is a key step specific to predictive modeling. Roughly speaking, you hold out part of available data for testing later, and build models using the remaining dataset. The held out set is called the 'test set' and the set we use for modeling is called the 'training set'. This makes it more difficult to overfit your model, because you can test your model against the data you didn't use in the modeling process, giving you a realistic idea how the model would actually perform with unknown data.

Exploratory analysis usually begins with visualizing data to get oriented with its nuances. Plots of the first four variables below show that:

- data is organized by class - like 'AAAAABBBBBCCCCC'. This can be an artifact of the way the data was collected and real life data may not be structured like this, so we want to use more realistic data to build our model. We can fix this issue by randomly reshuffling the data.
- data points cluster around a few different mean values - indicating that measurements were taken by devices calibrated in a few different ways
- those variables exhibit a distinct pattern for Class E (colored in magenta) - those variables will be useful to isolate it

Review `preprocess.m` if you are curious about the details.

preprocess

The dataset has some issues with calibration. We could further preprocess the data in order to remove calibration gaps. This time, however, I would like to use a flexible predictive algorithm called Random Forest. In MATLAB, this algorithm is implemented in the TreeBagger class available in Statistics Toolbox.

Random Forest became popular particularly after it was used by a number of winners in Kaggle competitions. It uses a large ensemble of decision trees (thus 'forest') trained on random subsets of data and uses the majority votes of those trees to predict the result. It tends to produce a highly accurate result, but the complexity of the algorithm makes it slow and difficult to interpret.

To accelerate the computation, I will enable the parallel option supported by Parallel Computing Toolbox. You can comment out unnecessary code if you don't use it.

Once the model is built, you will see the confusion matrix that compares the actual class labels to predicted class labels. If everything lines up on a diagonal line, then you achieved 100% accuracy. Off-diagonal numbers are misclassification errors.

The model has a very high prediction accuracy even though we saw earlier that our dataset had some calibration issues.

Initialize parallel option - comment out if you don't use parallel

poolobj = gcp('nocreate'); % don't create a new pool even if no pool exits if isempty(poolobj) parpool('local',2); end opts = statset('UseParallel',true);

Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers.

Create a Random Forest model with 100 trees, parallel enabled...

rfmodel = TreeBagger(100,table2array(Xtrain),Ytrain,... 'Method','classification','oobvarimp','on','options',opts);

Here's the non-parallel version of the same model rfmodel = TreeBagger(100,table2array(Xtrain),Ytrain,... 'Method','classification','oobvarimp','on');

Predict the class labels for the test set.

[Ypred,Yscore]= predict(rfmodel,table2array(Xtest));

Compute the confusion matrix and prediction accuracy.

C = confusionmat(Ytest,categorical(Ypred)); disp(array2table(C,'VariableNames',rfmodel.ClassNames,'RowNames',rfmodel.ClassNames)) fprintf('Prediction accuracy on test set: %f\n\n', sum(C(logical(eye(5))))/sum(sum(C)))

A B C D E ____ ___ ___ ___ ___ A 1133 1 0 0 0 B 4 728 1 0 0 C 0 3 645 3 0 D 0 0 8 651 0 E 0 0 0 6 741 Prediction accuracy on test set: 0.993374

I happened to pick 100 trees in the model, but you can check the misclassification errors relative to the number of trees used in the prediction. The plot shows that 100 is an overkill - we could use fewer trees and that will make it go faster.

figure plot(oobError(rfmodel)); xlabel('Number of Grown Trees'); ylabel('Out-of-Bag Classification Error');

One major criticism of Random Forest is that it is a black box algorithm and it not easy to understand what it is doing. However, Random Forest can provide the variable importance measure, which corresponds to the change in prediction error with and without the presence of a given variable in the model.

For our hypothetical weight lifting trainer mobile app, Random Forest would be too cumbersome and slow to implement, so you want to use a simpler prediction model with fewer predictor variables. Random Forest can help you with selecting which predictors you can drop without sacrificing the prediction accuracy too much.

Let's see how you can do this with `TreeBagger`.

Get the names of variables

vars = Xtrain.Properties.VariableNames;

Get and sort the variable importance scores. Because we turned `'oobvarimp'` to `'on'`, the model contains `OOBPermutedVarDeltaError` which acts as the variable importance measure.

varimp = rfmodel.OOBPermutedVarDeltaError'; [~,idxvarimp]= sort(varimp); labels = vars(idxvarimp);

Plot the sorted scores.

figure barh(varimp(idxvarimp),1); ylim([1 52]); set(gca, 'YTickLabel',labels, 'YTick',1:numel(labels)) title('Variable Importance'); xlabel('score')

Now let's do the trade-off between the number of predictor variables and prediction accuracy. The Receiver operating characteristic (ROC) plot provides a convenient way to visualize and compare performance of binary classifiers. You plot the false positive rate against the true positive rate at various prediction thresholds to produce the curves. If you get a completely random result, the curve should follow a diagonal line. If you get a 100% accuracy, then the curve should hug the upper left corner. This means you can use the area under the curve (AUC) to evaluate how well each model performs.

Let's plot ROC curves with different sets of predictor variables, using the "C" class as the positive class, since we can only do this one class at a time, and the previous confusion matrix shows more misclassification errors for this class than others. You can use `perfcurve` to compute ROC curves.

Check out `myROCplot.m` to see the details.

```
nFeatures = [3,5,10,15,20,25,52];
myROCplot(Xtrain,Ytrain,Xtest,Ytest,'C',nFeatures,varimp)
```

Based on the previous analysis, it looks like you can achieve a high accuracy rate even if you use as few as 10 features. Let's say we settled for 12 features. We now know you don't have to use the data from the glove for prediction, so that's one less sensor our hypothetical end users would have to buy. Given this result, I may even consider dropping the arm band, and just stick with the belt and dumbbell sensors.

Show which features were included.

disp(table(varimp(idxvarimp(1:12)),'RowNames',vars(idxvarimp(1:12)),... 'VariableNames',{'Importance'}));

Importance __________ accel_belt_y 0.69746 pitch_dumbbell 0.77255 accel_arm_z 0.78941 accel_belt_x 0.81696 magnet_arm_y 0.81946 accel_arm_x 0.87168 magnet_arm_x 0.87897 accel_dumbbell_x 0.92222 magnet_forearm_x 1.0172 total_accel_belt 1.0461 gyros_arm_z 1.1077 gyros_belt_x 1.1235

Shut down the parallel pool.

delete(poolobj);

Despite my initial misgivings about the data, we were able to maintain high prediction accuracy with a Random Forest model with just 12 features. However, Random Forest is probably not an ideal model to implement on a mobile app given its memory foot print and slow response time.

The next step is to find a simpler model, such as logistics regression, that can perform decently. You may need to do more preprocessing of the data to make it work.

Finally, I have never tried this before, but you could generate C code out of MATLAB to incorporate it into a mobile app. Watch this webinar, MATLAB to iPhone Made Easy, for more details.

Do you track your activities with wearable devices? Have you tried analyzing the data? Tell us about your experience here!

Get
the MATLAB code

Published with MATLAB® R2014a

This summer my mother-in-law is renting a house on a lake in New Hampshire. Looking at the calendar, my wife said: "The ten-day forecast makes it look like it's going to be pretty hot up at the lake next week." This led to a more general discussion of the merits of ten-day forecasts.

It's funny how we can make decisions based on long-term predictions of weather even though we rarely go back and verify that the forecast was any good. Somehow the fact that the forecast exists at all gives it value. I'm left pondering this question: how much should we trust a ten-day prediction? As it happens, I have some data that can be useful here. For some time, I have been collecting some relevant data on Trendy: the ten day forecast for Natick, Massachusetts (hometown for MathWorks). So let's run some numbers.

Here's the trend: Ten Day Forecast Highs for Natick, MA.

Once a day this trend collects ten data points: today's high temperature and the predicted high temperature for the next nine days. In MATLAB, we'll be working with a matrix with one row for each day and ten columns.

Let's get the data into MATLAB so we can play around with it. I can retrieve (and so can you) the data from Trendy as a JSON object using the following call:

http://www.mathworks.com/matlabcentral/trendy/trends/1655/trend_data.json

In order to read this into MATLAB, I'm going to use Joe Hicklin's JSON parser.

```
url = 'http://www.mathworks.com/matlabcentral/trendy/trends/1655/trend_data.json';
json = urlread(url);
raw = JSON.parse(json);
```

t = zeros(length(raw),1); d = zeros(length(raw),10); for i = 1:length(raw) t(i) = raw{i}{1}; predictions = raw{i}{2}; for j = 1:10 d(i,j) = str2num(predictions{j}); end end firstTenRows = d(1:10,:)

firstTenRows = 44 51 59 66 46 48 53 50 51 54 50 58 63 49 48 52 48 46 57 54 59 61 49 47 53 49 48 43 41 48 62 49 48 54 49 39 39 44 47 46 49 48 54 50 40 38 39 47 51 54 47 55 50 39 40 48 52 53 53 53 54 50 40 39 48 53 54 52 52 50 49 40 38 50 55 54 56 56 53 49 40 39 50 56 54 52 56 54 47 43 39 50 55 55 55 59 58 40 41 46

Now I have a temperature prediction matrix that's structured like this.

I want to re-order this matrix so that each line shows the prediction trajectory for a single day in time. That means picking off the diagonals highlighted in the diagram above. So let's write some code that does this shift. I'm going to end up with two new matrices, d1 and d2

rowIndex = 1:10; colIndex = 10:-1:1; sz = size(d); len = (size(d,1)-10); d1 = zeros(len,10); d2 = zeros(len,10); t1 = zeros(len,1); for i = 1:len ind = sub2ind(sz,rowIndex+i-1,colIndex); trend = d(ind); d1(i,:) = trend; d2(i,:) = trend-trend(end); t1(i) = t(i+9); end firstTenRows = d1(1:10,:)

firstTenRows = 54 57 43 39 38 40 39 38 39 39 54 41 44 39 48 48 50 50 50 49 48 47 47 52 53 55 56 55 54 57 46 51 53 54 54 54 55 55 57 58 54 53 52 56 52 55 57 57 60 58 53 52 56 56 59 59 61 63 63 64 50 53 54 58 45 42 45 44 44 43 49 47 40 39 37 42 43 43 43 43 43 41 41 42 44 48 47 49 48 47 46 44 48 49 46 52 51 50 49 48

In d1, each row is the evolving temperature prediction for each day. So when we plot the first row of d1, we're getting the predictive arc for November 13th of last year.

i = 1; plot(-9:0,d1(i,:)) title(sprintf('Predicted Temperature for %s',datestr(t1(i),1))) xlabel('Time of Prediction (Offset in Days)') ylabel('Predicted Temperature (Deg. F)')

In d2, we just subtract from each row the last value in each row. Since this last value is the final (and presumably correct) temperature, this difference gives us the predictive error across the ten days. Here's the error for the November 13th prediction.

i = 1; plot(-9:0,d2(i,:)) title(sprintf('Error in Predicted Temperature for %s',datestr(t1(i),1))) xlabel('Time of Prediction (Offset in Days)') ylabel('Prediction Error (Deg. F)')

Notice that it shrinks to zero over time. That's good! Our predictions get more accurate as we approach the actual day in question. But the early predictions were off by as much as 18 degrees. Is that good or bad? You tell me.

Now let's look at all the days.

plot(-9:0,d2','Color',[0.5 0.5 1]) title('Error in Predicted Temperature') xlabel('Time of Prediction (Offset in Days)') ylabel('Prediction Error (Deg. F)')

It's hard to get a sense of the error distribution. So let's finish with a histogram of the absolute value of the error. Out of 240 measurements in this data set, the median error for a ten-day prediction is six degrees.

hist(abs(d2(:,1)),1:25) title('Histogram of Error in the Ten-Day Forecast') xlabel('Error (deg. F)') ylabel('Number of Samples')

That seems pretty good. Most of the time that error is going to be less than seven or so degrees Fahrenheit (or four degrees Celsius). I probably don't need to pack a sweater for the weekend at the lake.

Get
the MATLAB code

Published with MATLAB® R2014a

I'm going to play a small trick on you today. Try reading in this JPEG file using `imread`:

```
url = 'http://blogs.mathworks.com/images/steve/2014/peppers.jpg';
rgb = imread(url);
imshow(rgb)
```

So what's the trick? Well, look more closely at this file using `imfinfo`:

info = imfinfo(url)

info = Filename: 'http://blogs.mathworks.com/images/steve/2014/...' FileModDate: '11-Jul-2014 14:46:33' FileSize: 287677 Format: 'png' FormatVersion: [] Width: 512 Height: 384 BitDepth: 24 ColorType: 'truecolor' FormatSignature: [137 80 78 71 13 10 26 10] Colormap: [] Histogram: [] InterlaceType: 'none' Transparency: 'none' SimpleTransparencyData: [] BackgroundColor: [] RenderingIntent: [] Chromaticities: [] Gamma: [] XResolution: [] YResolution: [] ResolutionUnit: [] XOffset: [] YOffset: [] OffsetUnit: [] SignificantBits: [] ImageModTime: '16 Jul 2002 16:46:41 +0000' Title: [] Author: [] Description: 'Zesty peppers' Copyright: 'Copyright The MathWorks, Inc.' CreationTime: [] Software: [] Disclaimer: [] Warning: [] Source: [] Comment: [] OtherText: []

See it yet? No?

Look at the Format field:

info.Format

ans = png

The function `imfinfo` is claiming this this JPEG file is really a PNG file, which is a completely different image file format!

So what's going on here? Is this a JPEG file or not?

This trick question is really just an excuse to peek under the hood of `imread` to see how an interesting piece of it works. (Well, it's interesting to me, at least.)

Before opening the hood, though, let's try reading one more file. And notice that this filename has an extension that has nothing to do with any particular image format.

```
url2 = 'http://blogs.mathworks.com/images/steve/2014/peppers.fruit_study_2014_Jul_11';
rgb2 = imread(url2);
imshow(rgb2)
```

OK, so `imread` can successfully read this image file even without an extension indicating its format.

If you're curious, a lot of the `imread` code that makes all this work is available for you to look at in your installation of MATLAB. (If, on the other hand, you're not curious, then this would be a good time to go over and read Cleve's blog instead.) For example, you can view the source code for `imread` in the MATLAB Editor by typing `edit imread`. (Please don't modify the code, though!)

Here's a partial fragment of code near the top:

if (isempty(fmt_s)) % The format was not specified explicitly.

... snip ...

% Try to determine the file type. [format, fmt_s] = imftype(filename);

Hmm, what's that function `imftype`?

```
which imftype
```

'imftype' not found.

It doesn't appear to exist!

It does exist, but it happens to be a private function. The function `which` will find it if you tell `which` to look at little harder.

which -all imftype

/Applications/MATLAB_R2014a.app/toolbox/matlab/imagesci/private/imftype.m % Private to imagesci

Even though you can't directly call this function (that's what *private* means here), you can still look at it in the MATLAB Editor by typing `edit private/imftype`.

Here's some code from near the beginning of `imftype`.

idx = find(filename == '.'); if (~isempty(idx)) extension = lower(filename(idx(end)+1:end)); else extension = ''; end

% Try to get useful imformation from the extension.

if (~isempty(extension))

% Look up the extension in the file format registry. fmt_s = imformats(extension);

if (~isempty(fmt_s))

if (~isempty(fmt_s.isa))

% Call the ISA function for this format. tf = feval(fmt_s.isa, filename);

if (tf)

% The file is of that format. Return the ext field. format = fmt_s.ext{1}; return;

end end end end

In English: If the filename has an extension on it, use the `imformats` function to get a function that can test to see whether the file really has that format.

So what's that new function `imformats` in the middle there? Well, you can call this one directly. Try it.

s = imformats

s = 1x19 struct array with fields: ext isa info read write alpha description

That output is not very readable. If we were designing this today, we'd probably make `imformats` return a table. Fortunately we've got an easy way to convert a struct array into table!

t = struct2table(s)

t = ext isa info read write alpha __________ _______ ___________ _________ _________ _____ {1x1 cell} @isbmp @imbmpinfo @readbmp @writebmp 0 {1x1 cell} @iscur @imcurinfo @readcur '' 1 {1x2 cell} @isfits @imfitsinfo @readfits '' 0 {1x1 cell} @isgif @imgifinfo @readgif @writegif 0 {1x1 cell} @ishdf @imhdfinfo @readhdf @writehdf 0 {1x1 cell} @isico @imicoinfo @readico '' 1 {1x2 cell} @isjp2 @imjp2info @readjp2 @writej2c 0 {1x1 cell} @isjp2 @imjp2info @readjp2 @writejp2 0 {1x2 cell} @isjp2 @imjp2info @readjp2 '' 0 {1x2 cell} @isjpg @imjpginfo @readjpg @writejpg 0 {1x1 cell} @ispbm @impnminfo @readpnm @writepnm 0 {1x1 cell} @ispcx @impcxinfo @readpcx @writepcx 0 {1x1 cell} @ispgm @impnminfo @readpnm @writepnm 0 {1x1 cell} @ispng @impnginfo @readpng @writepng 1 {1x1 cell} @ispnm @impnminfo @readpnm @writepnm 0 {1x1 cell} @isppm @impnminfo @readpnm @writepnm 0 {1x1 cell} @isras @imrasinfo @readras @writeras 1 {1x2 cell} @istif @imtifinfo @readtif @writetif 0 {1x1 cell} @isxwd @imxwdinfo @readxwd @writexwd 0 description __________________________________ 'Windows Bitmap' 'Windows Cursor resources' 'Flexible Image Transport System' 'Graphics Interchange Format' 'Hierarchical Data Format' 'Windows Icon resources' 'JPEG 2000 (raw codestream)' 'JPEG 2000 (Part 1)' 'JPEG 2000 (Part 2)' 'Joint Photographic Experts Group' 'Portable Bitmap' 'Windows Paintbrush' 'Portable Graymap' 'Portable Network Graphics' 'Portable Any Map' 'Portable Pixmap' 'Sun Raster' 'Tagged Image File Format' 'X Window Dump'

Now we've gotten to some interesting stuff! This table represents the guts of how `imread`, `imfinfo`, and `imwrite` knows how to deal with the many different image file formats supported.

If you pass an extension to `imformats`, it looks through file formats it knows about to see if it matches a standard one.

```
imformats('jpg')
```

ans = ext: {'jpg' 'jpeg'} isa: @isjpg info: @imjpginfo read: @readjpg write: @writejpg alpha: 0 description: 'Joint Photographic Experts Group'

Let's go back to the original file peppers.jpg and consider what happens in the code we've seen so far.

1. We did not specify the format explicitly (with a 2nd argument) when we called `imread`, so `imread` called `imftype` to determine the format type.

2. The function `imftype` found an extension ('.jpg') at the end of the filename, so it asked the `imformats` function about the extension, and `imformats` returned a set of function handles useful for doing things with JPEG files. One of the function handles, `@isjpg`, tests to see whether a file is a JPEG file or not.

To be completely truthful, `@isjpg` just does a quick check based only on the first few bytes of the file. Look at the code by typing `edit private/isjpg`. Here are the key lines.

fid = fopen(filename, 'r', 'ieee-le'); assert(fid ~= -1, message('MATLAB:imagesci:validate:fileOpen', filename)); sig = fread(fid, 2, 'uint8'); fclose(fid); tf = isequal(sig, [255; 216]);

*OK, I have now taught you enough so that you can thoroughly confuse imread if you really want to. But don't you already have enough hobbies?*

3. In this case, the file wasn't actually a JPEG file, so the function handle `@isjpg` returned 0 (false).

That brings us to the rest of the excitement in `imftype`.

% Get all formats from the registry. fmt_s = imformats;

% Look through each of the possible formats. for p = 1:length(fmt_s) % Call each ISA function until the format is found. if (~isempty(fmt_s(p).isa)) tf = feval(fmt_s(p).isa, filename); if (tf) % The file is of that format. Return the ext field. format = fmt_s(p).ext{1}; fmt_s = fmt_s(p); return end else warning(message('MATLAB:imagesci:imftype:missingIsaFunction')); end end

In English: For every image file format we know about, run the corresponding `isa` function handle on the file. If one of the `isa` functions returns true, then return the corresponding set of information from `imformats`.

Back to the story for our misnamed image file peppers.jpg. The `@isjpg` function handle returned false for it. So `imftype` then tried the `isa` function handles for every image file format. One of them, `@ispng`, returned 1 (true). That information was passed back up to `imread`, which then read the file successfully as a PNG, which was the file's true format type.

Finally, here's what happened for the image file peppers.2014_Jul_11. When `imftype` passed the extension `'2014_Jul_11'` to `imformats`, no such image format extension was found, so `imformats` returned empty. That caused `imftype` to go into the code that simply tried every image format it knew about, which again worked when it got to PNG.

Phew! That's the story of the effort `imread` makes to read your images in correctly.

For the three of you that are still reading along, I'll send a t-shirt to the first one to post a convincingly complete explanation of this line of code from above:

extension = lower(filename(idx(end)+1:end));

Get
the MATLAB code

Published with MATLAB® R2014a

Sean's pick this week is "C-MEX Programming Tutorial Examples" by Ilias Konsoulas.

I spent the last few days at the 2014 SIAM conference in Chicago manning the MathWorks booth with a few of our developers. We were asked many questions about the products, new features, life at MathWorks, and of course for t-shirts and rubik's cubes.

One of the questions I was asked was along the lines of: "How do I get started with MEX?". MEX, which is short for "MATLAB Executable", is compiled C/C++ or Fortran code with the appropriate MATLAB wrappers on it so that it can be called at the command line like any other MATLAB function. Personally, I haven't written any productive C/C++ code since high school about ten years ago. When I need C or C++ code, I develop the algorithm in MATLAB and then use MATLAB Coder to automagically generate equivalent C code.

I had found Ilias' submission a while ago and added it to the "to experiment with list". Now, it allowed me to answer this question while sounding knowledgable and getting the user started on his journey.

The submission comes with nine heavily commented C files that do simple matrix manipulations like summing and reshaping as well as a readme file that documents the package. The C files show the MEX syntax, required headers and some good programming practices like error checking.

Curious if I could actually write my own hand-written MEX file, I challenged myself to write one which calculates the sum
of the diagonal of a matrix. I started by editing Ilias' `transposeM.c` which should be a good blueprint.

After some time, having it fail to compile 20ish times, and getting stuck in an infinite loop only once(!), (I forgot to increment
*i* in the `for`-loop), it works!

Here's a snapshot of the file:

That was an exercise in frustration. How about we just use MATLAB Coder to do this? First, I wrote a MATLAB function to
calculate the sum of the diagonal; there happens to be a function called `trace` that does this:

Now to generate C code, you can use the MATLAB Coder App to generate code interactively, or do it programatically with `codegen`. I'll take the latter approach since it's easier to automate.

% Input must be a double precision matrix of size "up-to-inf" by "up-to-inf" input_specs = coder.typeof(zeros,inf,inf); codegen mytrace -args {input_specs}

This generates `mytrace_mex` which I can call like any other function.

rng(5) % reproducible X = gallery('randhess',7); % a matrix t = mytrace_mex(X); % use mex file disp(t)

-0.1573

Now let's see if the two C implementations and the MATLAB implementation agree using the MATLAB unit testing framework.

% Build interactive test object Tester = matlab.unittest.TestCase.forInteractiveUse; % Assert equality assertEqual(Tester,trace(X),traceM(X)) % My C implementation assertEqual(Tester,trace(X),mytrace_mex(X)) % MATLAB Coder's implementation

Interactive assertion passed. Interactive assertion passed.

Give it a try and let us know what you think here or leave a comment for Ilias.

Get
the MATLAB code

Published with MATLAB® R2014a

**The SolidWorks Assembly**

Let's start by creating an assembly of a double pendulum in SolidWorks. It consists of 2 identical links and 2 identical pins. These components are assembled by defining mates between them. For example to create a revolute joint between two parts in SolidWorks, we can constrain their axes and make them coincident:

I recommend looking at the Mates and Joints documentation page for more details on how SolidWorks mates are translated into SimMechanics Joints. For our example, here is how the final assembly looks:

**Exporting the SolidWorks assembly to SimMechanics**

This process requires the use of SimMechanics Link. Once SimMechanics Link is downloaded successfully, you need to Install and Register it with SolidWorks.

**Note: SimMechanics Link can also be used to interface with PTC Creo and the Autodesk Inventor.*

After registering SimMechanics Link, a new menu is available in SolidWorks. We can now export the assembly:

This generates an XML file for the assembly, well as STL files for the geometry. The XML file contains assembly structure and part parameters required to generate an equivalent SimMechanics model, such as reference frames, mass and inertia, color, and location of part STL files.

**Create SimMechanics model**

Once the XML file is exported from SolidWorks, we can import it in Simulink using `smimport`:

`smimport('DoublePendulum.xml')`

and we have a model:

**Do what you came here for!**

At this point, you can combine any feature available in Simulink with your mechanism. For example, we can design a controller for the base joint using the automated PID Tuning capability of Simulink Control Design

I modify the base joint to measure its position and apply a torque, and connect the PID block:

After going through the auto tuning procedure, we can see the resulting motion:

**What if you’re not using SolidWorks?**

As mentionned previously, SimMechanics Link can also be used to interface with PTC Creo and Autodesk Inventor. If you are designing mechanical systems using a different CAD tool, I recommend looking at the MATLAB Central submission CAD to MATLAB to SimMechanics by my colleague Steve Miller. Using the technique in this submission it might be possible to write your own adapter to import assemblies from other CAD software without too much work.

**Now it's your turn**

Here is one more animation from the Stewart Platform example, just to highlight that pretty complex assemblies can be imported.

Are you already importing CAD assemblies to SimMechanics? If yes, let us know what kind of analysis or design task you do with the assembly once in Simulink by leaving a comment here.

]]>This is the first part of a two-part series about the single- and double precision floating point numbers that MATLAB uses for almost all of its arithmetic operations. (This post is adapted from section 1.7 of my book Numerical Computing with MATLAB, published by MathWorks and SIAM.)

If you look carefully at the definitions of fundamental arithmetic operations like addition and multiplication, you soon encounter the mathematical abstraction known as real numbers. But actual computation with real numbers is not very practical because it involves notions like limits and infinities. Instead, MATLAB and most other technical computing environments use floating point arithmetic, which involves a finite set of numbers with finite precision. This leads to the phenomena of *roundoff*, *underflow*, and *overflow*. Most of the time, it is possible to use MATLAB effectively without worrying about these details, but, every once in a while, it pays to know something about the properties and limitations of floating point numbers.

Thirty years ago, the situation was far more complicated than it is today. Each computer had its own floating point number system. Some were binary; some were decimal. There was even a Russian computer that used trinary arithmetic. Among the binary computers, some used 2 as the base; others used 8 or 16. And everybody had a different precision. In 1985, the IEEE Standards Board and the American National Standards Institute adopted the ANSI/IEEE Standard 754-1985 for Binary Floating-Point Arithmetic. This was the culmination of almost a decade of work by a 92-person working group of mathematicians, computer scientists, and engineers from universities, computer manufacturers, and microprocessor companies.

A revision of IEEE_754-1985, known as IEEE 754-2008, was published in 2008. The revision combines the binary standards in 754 and the decimal standards in another document, 854. But as far as I can tell, the new features introduced in the revision haven't had much impact yet.

William M. Kahan was the primary architect of IEEE floating point arithmetic. I've always known him by his nickname "Velvel". I met him in the early 1960's when I was a grad student at Stanford and he was a young faculty member at the University of Toronto. He moved from Toronto to UC Berkeley in the late 1960's and spent the rest of his career at Berkeley. He is now an emeritus professor of mathematics and of electrical engineering and computer science.

Velvel is an eminent numerical analyst. Among many other accomplishments, he is the developer along with Gene Golub of the first practical algorithm for computing the singular value decomposition.

In the late 1970's microprocessor manufacturers including Intel, Motorola, and Texas Instruments were developing the chips that were to become the basis for the personal computer and eventually today's electronics. In a remarkable case of cooperation among competitors, they established a committee to develop a common standard for floating point arithmetic.

Velvel was a consult to Intel, working with John Palmer, a Stanford grad who was the principal architect of the 8087 math coprocessor. The 8087 accompanied the 8086, which was to become the CPU of the IBM PC. Kahan and Palmer convinced Intel to allow them to release the specs for their math coprocessor to the IEEE committee. These plans became the basis for the standard.

Velvel received the ACM Turing Award in 1989 for "his fundamental contributions to numerical analysis". He was named an ACM Fellow in 1994, and was inducted into the National Academy of Engineering in 2005.

All computers designed since 1985 use IEEE floating point arithmetic. This doesn't mean that they all get exactly the same results, because there is some flexibility within the standard. But it does mean that we now have a machine-independent model of how floating point arithmetic behaves.

MATLAB has traditionally used the IEEE double precision format. In recent years, the single precision format has also been available. This saves space and can sometimes be faster. There is also an extended precision format, which is not precisely specified by the standard and which may be used for intermediate results by the floating point hardware. This is one source of the lack of uniformity among different machines.

Most nonzero floating point numbers are normalized. This means they can be expressed as

$$ x = \pm (1 + f) \cdot 2^e $$

The quantity $f$ is the *fraction* or *mantissa* and $e$ is the *exponent*. The fraction satisfies

$$ 0 \leq f < 1 $$

The fraction $f$ must be representable in binary using at most 52 bits for double precision and 23 bits for single recision. In other words, for double, $2^{52} f$ is an integer in the interval

$$ 0 \leq 2^{52} f < 2^{52} $$

For single, $2^{23} f$ is an integer in the interval

$$ 0 \leq 2^{23} f < 2^{23} $$

For double precision, the exponent $e$ is an integer in the interval

$$ -1022 \leq e \leq 1023 $$

For single, the exponent $e$ is in the interval

$$ -126 \leq e \leq 127 $$

The finiteness of $f$ is a limitation on *precision*. The finiteness of $e$ is a limitation on *range*. Any numbers that don't meet these limitations must be approximated by ones that do.

Double precision floating point numbers are stored in a 64-bit word, with 52 bits for $f$, 11 bits for $e$, and 1 bit for the sign of the number. The sign of $e$ is accommodated by storing $e+1023$, which is between $1$ and $2^{11}-2$. The 2 extreme values for the exponent field, $0$ and $2^{11}-1$, are reserved for exceptional floating point numbers that I will describe later.

Single precision floating point numbers are stored in a 32-bit word, with 23 bits for $f$, 8 bits for $e$, and 1 bit for the sign of the number. The sign of $e$ is accommodated by storing $e+127$, which is between $1$ and $2^{8}-2$. Again, the 2 extreme values for the exponent field are reserved for exceptional floating point numbers.

The entire fractional part of a floating point number is not $f$, but $1+f$. However, the leading $1$ doesn't need to be stored. In effect, the IEEE formats pack $33$ or $65$ bits of information into a $32$ or $64$-bit word.

My program `floatgui`, available here, shows the distribution of the positive numbers in a model floating point system with variable parameters. The parameter $t$ specifies the number of bits used to store $f$. In other words, $2^t f$ is an integer. The parameters $e_{min}$ and $e_{max}$ specify the range of the exponent, so $e_{min} \leq e \leq e_{max}$. Initially, `floatgui` sets $t = 3$, $e_{min} = -4$, and $e_{max} = 2$ and produces the following distribution.

Within each binary interval $2^e \leq x \leq 2^{e+1}$, the numbers are equally spaced with an increment of $2^{e-t}$. If $e = 0$ and $t = 3$, for example, the spacing of the numbers between $1$ and $2$ is $1/8$. As $e$ increases, the spacing increases.

It is also instructive to display the floating point numbers with a logarithmic scale. Here is the output from `floatgui` with `logscale` checked and $t = 5$, $e_{min} = -4$, and $e_{max} = 3$. With this logarithmic scale, it is more apparent that the distribution in each binary interval is the same.

A very important quantity associated with floating point arithmetic is highlighted in red by `floatgui`. MATLAB calls this quantity `eps`, which is short for *machine epsilon*.

`eps`is the distance from $1$ to the next larger floating point number.

For the `floatgui` model floating point system, `eps` = $2^{-t}$.

Before the IEEE standard, different machines had different values of `eps`. Now, for IEEE double precision the approximate decimal value of `eps` is

format short format compact eps_d = eps

eps_d = 2.2204e-16

And for single precision,

```
eps_s = eps('single')
```

eps_s = 1.1921e-07

Either `eps/2` or `eps` can be called the roundoff level. The maximum relative error incurred when the result of an arithmetic operation is rounded to the nearest floating point number is `eps/2`. The maximum relative spacing between numbers is `eps`. In either case, you can say that the roundoff level is about 16 decimal digits for double and about 7 decimal digits for single.

Actually `eps` is a function. `eps(x)` is the distance from `abs(x)` to the next larger in magnitude floating point number of the same precision as `x`.

A frequently occurring example is provided by the simple MATLAB statement

t = 0.1

The mathematical value $t$ stored in `t` is not exactly $0.1$ because expressing the decimal fraction $1/10$ in binary requires an infinite series. In fact,

$$ \frac{1}{10} = \frac{1}{2^4} + \frac{1}{2^5} + \frac{0}{2^6} + \frac{0}{2^7} + \frac{1}{2^8} + \frac{1}{2^9} + \frac{0}{2^{10}} + \frac{0}{2^{11}} + \frac{1}{2^{12}} + \cdots $$

After the first term, the sequence of coefficients $1, 0, 0, 1$ is repeated infinitely often. Grouping the resulting terms together four at a time expresses $1/10$ n a base 16, or *hexadecimal*, series.

$$ \frac{1}{10} = 2^{-4} \cdot \left(1 + \frac{9}{16} + \frac{9}{16^2} + \frac{9}{16^3} + \frac{9}{16^{4}} + \cdots\right) $$

Floating-point numbers on either side of $1/10$ are obtained by terminating the fractional part of this series after $52$ binary terms, or 13 hexadecimal terms, and rounding the last term up or down. Thus

$$ t_1 < 1/10 < t_2 $$

where

$$ t_1 = 2^{-4} \cdot \left(1 + \frac{9}{16} + \frac{9}{16^2} + \frac{9}{16^3} + \cdots + \frac{9}{16^{12}} + \frac{9}{16^{13}}\right) $$

$$ t_2 = 2^{-4} \cdot \left(1 + \frac{9}{16} + \frac{9}{16^2} + \frac{9}{16^3} + \cdots + \frac{9}{16^{12}} + \frac{10}{16^{13}}\right) $$

It turns out that $1/10$ is closer to $t_2$ than to $t_1$, so $t$ is equal to $t_2$. In other words,

$$ t = (1 + f) \cdot 2^e $$

where

$$ f = \frac{9}{16} + \frac{9}{16^2} + \frac{9}{16^3} + \cdots + \frac{9}{16^{12}} + \frac{10}{16^{13}} $$

$$ e = -4 $$

The MATLAB command

```
format hex
```

causes `t` to be displayed as

3fb999999999999a

The characters `a` through `f` represent the hexadecimal "digits" 10 through 15. The first three characters, `3fb`, give the hexadecimal representation of decimal $1019$, which is the value of the biased exponent $e+1023$ if $e$ is $-4$. The other 13 characters are the hexadecimal representation of the fraction $f$.

In summary, the value stored in `t` is very close to, but not exactly equal to, $0.1$. The distinction is occasionally important. For example, the quantity

0.3/0.1

is not exactly equal to $3$ because the actual numerator is a little less than $0.3$ and the actual denominator is a little greater than $0.1$.

Ten steps of length `t` are not precisely the same as one step of length 1. MATLAB is careful to arrange that the last element of the vector

0:0.1:1

is exactly equal to $1$, but if you form this vector yourself by repeated additions of $0.1$, you will miss hitting the final $1$ exactly.

What does the floating point approximation to the golden ratio, $\phi$, look like?

```
format hex
phi = (1 + sqrt(5))/2
```

phi = 3ff9e3779b97f4a8

The first hex digit, `3`, is `0011` in binary. The first bit is the sign of the floating point number; `0` is positive, `1` is negative. So `phi` is positive. The remaining bits of the first three hex digits contain $e+1023$. In this example, `3ff` in base 16 is $3 \cdot 16^2 + 15 \cdot 16 + 15 = 1023$ in decimal. So

$$ e = 0 $$

In fact, any floating point number between $1.0$ and $2.0$ has $e = 0$, so its `hex` output begins with `3ff`. The other 13 hex digits contain $f$. In this example,

$$ f = \frac{9}{16} + \frac{14}{16^2} + \frac{3}{16^3} + \cdots + \frac{10}{16^{12}} + \frac{8}{16^{13}} $$

With these values of $f$ and $e$,

$$ (1 + f) 2^e \approx \phi $$

Another example is provided by the following code segment.

```
format long
a = 4/3
b = a - 1
c = 3*b
e = 1 - c
```

a = 1.333333333333333 b = 0.333333333333333 c = 1.000000000000000 e = 2.220446049250313e-16

With exact computation, `e` would be zero. But with floating point, we obtain double precision `eps`. Why?

It turns out that the only roundoff occurs in the division in the first statement. The quotient cannot be exactly $4/3$, except on that Russian trinary computer. Consequently the value stored in `a` is close to, but not exactly equal to, $4/3$. The subtraction `b = a - 1` produces a quantity `b` whose last bit is 0. This means that the multiplication `3*b` can be done without any roundoff. The value stored in `c` is not exactly equal to $1$, and so the value stored in `e` is not 0. Before the IEEE standard, this code was used as a quick way to estimate the roundoff level on various computers.

The roundoff level `eps` is sometimes called ``floating point zero,'' but that's a misnomer. There are many floating point numbers much smaller than `eps`. The smallest positive normalized floating point number has $f = 0$ and $e = -1022$. The largest floating point number has $f$ a little less than $1$ and $e = 1023$. MATLAB calls these numbers `realmin` and `realmax`. Together with `eps`, they characterize the standard system.

Double Precision Binary Decimal eps 2^(-52) 2.2204e-16 realmin 2^(-1022) 2.2251e-308 realmax (2-eps)*2^1023 1.7977e+308

Single Precision Binary Decimal eps 2^(-23) 1.1921e-07 realmin 2^(-126) 1.1755e-38 realmax (2-eps)*2^127 3.4028e+38

If any computation tries to produce a value larger than `realmax`, it is said to *overflow*. The result is an exceptional floating point value called *infinity* or `Inf`. It is represented by taking $e = 1024$ and $f = 0$ and satisfies relations like `1/Inf = 0` and `Inf+Inf = Inf`.

If any computation tries to produce a value that is undefined even in the real number system, the result is an exceptional value known as Not-a-Number, or `NaN`. Examples include `0/0` and `Inf-Inf`. `NaN` is represented by taking $e = 1024$ and $f$ nonzero.

If any computation tries to produce a value smaller than `realmin`, it is said to *underflow*. This involves one of most controversial aspects of the IEEE standard. It produces exceptional *denormal* or *subnormal* floating point numbers in the interval between `realmin` and `eps*realmin`. The smallest positive double precision subnormal number is

format short smallest_d = eps*realmin smallest_s = eps('single')*realmin('single')

smallest_d = 4.9407e-324 smallest_s = 1.4013e-45

Any results smaller than this are set to 0.

The controversy surrounding underflow and denormals will be the subject of my next blog post.

Get
the MATLAB code

Published with MATLAB® R2014a

Jiro's pick this week is Useful Figure Management Utilities by Michelle Hirsch.

One of the reasons I really enjoy programming in MATLAB is the interactive development environment (IDE). The various components allow me to be productive when I'm developing an algorithm or analyzing some data. But what makes the environment even more powerful is the ability to programmatically customize some of the workflows. The 3 useful functions provided by Michelle help with managing your figure windows. The function names are self-explanatory, but here they are.

`figshift`- This is useful when you create a couple of figures and quickly want to be able to click back and forth between the two. Calling this function will shift the current figure window slightly from the previous window, allowing you to easily click between the two.`fillscreen`- As the name suggests, this "maximizes" the figure window to fill the whole screen.`showmethefigs`- This is my favorite. By calling this after you create all the figures, you can cycle through all the figure windows by hitting a key on your keyboard.

While you can use these functions directly from the Command Window, I find them most useful by sprinkling these within my script. This way, my figures will be placed appropriately at the end of the script.

surf(peaks) fillscreen figure plot(rand(10,3)) figure spy(bucky) figshift

**Comments**

This entry was so useful that it inspired Mirko to create "Figure Management Utilities" with additional functionality. Do you have any other favorite utilities that allow you to be more productive with the MATLAB IDE? Tell us about it here or leave a comment for Michelle.

Get
the MATLAB code

Published with MATLAB® R2014a

I teach the MATLAB 101 class to every new hire at MathWorks. Inevitably, someone will ask me how to make a global variable. I then Google up: “How to make MATLAB Doug Cry” and find the above post. I think this is a post I should bring up every now and then because these things should be avoided today just like they should have been when I first wrote it.

I will be back from vacation soon with fresh content.

Thanks,
Doug]]>

Please leave a comment below if you have similar needs and use cases.

**My Need**

In various applications I work on, I often end up needing to reset the states of multiple blocks at the same time. Some blocks like Integrator and Delay have reset ports which I can use for this purpose. However other blocks like State-Space, Transfer Function or Unit Delay do not.

**Blocks with Reset Ports**

If blocks have reset ports, I can arrange my diagram to reset all those blocks at the same time:

This gives me the behavior I need, but if I have many blocks that need to be reset, this is a lot of routing. Also, there are multiple blocks that have states, but do not have an optional reset port.

**Enabled Subsystem**

What I am really dreaming of is a subsystem with a control port like the Enabled Subsystem, without the need to disable for one step to reset the states. With an Enabled Subsystem the control signal resets the states when it is re-enabled.

When simulating the model, we can see that this is not ideal because the system is disabled during one step:

**Function-Call subsystem and Stateflow**

Using Stateflow it is possible to reset the states in a subsystem without disabling it for one step.

To make that happen, we need to place the blocks inside a Function-Call Subsystem. We can then drive the subsystem with a Stateflow Chart. In the chart, bind an output event to a state and exit this state to trigger the reset of the states in the Function-Call Subsystem:

Note, this solution will not work for blocks with continuous states like Transfer Function, State-Space, etc.

**Now it's your turn**

Do you see the need in your applications for a subsystem where the states of all the blocks inside would reset from a rising edge on a control port? Let us know by leaving a comment here.

]]>I have been posting blogs about MATLAB with code examples for many years. Steve Eddins, my fellow blogger of Steve on Image Processing fame, developed and maintains an internal tool that automates a lot of tasks and I rely on it to publish my blog posts.

One of my guest bloggers, Toshi Takeuchi, showed me a new tool he found, and he would like to do a quick introduction.

Hi, it's Toshi again. If you use MATLAB for fun like me, you would probably like to publish your work to share with other people. I do web analysis in marketing and I can tell you there are many people like you out there, based on the increasing search volume for personal use of MATLAB.

I have been writing to our internal WordPress blog service, and most of my posts involve MATLAB code examples. I would first do my analysis in MATLAB, then add commentary to explain what I did, and then publish it to HTML. After that, I had to manually copy and paste the published text and code to WordPress – and the process is tolerable but isn’t particularly easy.

I recently started contributing, as a guest blogger, some of my personal posts for Loren who maintains her public-facing blog. The other day I had a chance to witness how she posts her blogs. While her process was much more efficient than mine, I started wondering if we could do even better.

I suspect that this may be a widely shared pain. So I dug around a bit, and I found this wonderful MATLAB module called matlab-wordpress by John Kitchin on GitHub.

Installation was very straight forward. I just downloaded the zip file, added the extracted folder to the MATLAB search path, and jar files to `javaclasspath`, as described in `README.md`.

Next, enable XML-RPC (`Enable the WordPress, Movable Type, MetaWeblog and Blogger XML-RPC publishing protocols`) in your admin console. This is under:

Settings -> Writing -> Remote Publishing

You can avoid entering your username and password for your inside WP account if you set up `blogCredentials.m` as described in `README.md`.

user = 'your blog username'; password = 'your blog password'; server = 'http://your-blog/xmlrpc.php';

If you include an image or generate a plot in your MATLAB code, then the image will be automatically uploaded to the Media Library automatically. Sweet!

```
figure
hist(randn(1000,1),20)
title('Normal Distribution with \mu=0, \sigma=1')
```

Let's say you want to link to a `.mat` file used in your post, and the file path should be defined relative to the location of the MATLAB file you are publishing. If it is in the same location as your MATLAB file, you can link to it as follows:

"You will need to :download:`sample.mat` for this example."

This will then translates to:

"You will need to download sample.mat for this example."

The file will be automatically uploaded to the same directory that holds all your images and the URL will be updated to reflect the new file location. This alone will save tons of hassle. Unfortunately this may not work if your WordPress server is not configured to accept nonmedia files like `.mat` files.

You can add a link using MATLAB Markup syntax, but `matlab-wordpress` also gives you an ability to link to another post on your blog. Since Loren doesn't have this MATLAB module, you won't see the effects of these next sections in this post.

"In :postid:`920` we discussed how to analyze Twitter with MATLAB..."

You can add a tooltip like this.

:tooltip:`<this is the tooltip.> Hover on this`

You can also add color to text to draw attention to a note or warning.

:note:`This is a note in light blue.` :warning:`This is a warning in pink.`

You can add categories and tags to your post if you include the following lines in your code. Currently it appears you can only add one of each, for some reason.

% categories: Blog % tags: MATLAB

Now it's time to publish your MATLAB file to WordPress. If your MATLAB file is `myFile.m`, then you enter the following command in the command window.

blogpost myFile.m

Perhaps you want to publish locally first as a dryrun. If so, instead do:

blogpost myFile.m true % set |dryrun| parameter to |true|

When you publish your post, your post ID and permalink is automatically added to the MATLAB file. If you update your file, you can update your blog post by simply typing `blogpost myFile.m` again. If you remove your post ID from your MATLAB file, that will result in a new post.

Because I happen to use WP-Syntax plugin on my blog for syntax highlighting, I have to remove the CSS tags generated by `publish` from MATLAB and make other adjustments related to that. If you can use the MATLAB generated CSS as is, then my guess is that you don't have to do any additional tweaks.

Even though I still had to do some cleaning, the overall process took a lot less time than the manual process I use. Mileage may vary depending on your settings, but I think this is going to be useful for many people. Let us know how it works for you here.

Get
the MATLAB code

Published with MATLAB® R2014a

Idin's pick for this week is Configurable Simulink Model for DC-DC Converters with PWM PI Control by Yi Cao.

Dr. Yi Cao has contributed a wealth of useful tools to the File Exchange (I saw 66 at the time of this writing). This week’s Pick came in handy for me recently as I was trying to investigate and demonstrate the value of Simulink for modeling DC-DC converters .

Power management ICs (PMICs) are critical pieces of most electronic systems, especially battery-operated devices such as mobile phones and tablets. As it was put to me recently, “PMICs are equivalent to the devices that turn the lights on at power-up, and turn them off when powering down.” The PMIC provides required amounts of power to different parts of the system. An important part of a PMIC is typically one or more DC-DC converter to provide regulated voltages to different parts of a system.

In this submission Yi Cao provides a nice example of how the behavior of DC-DC converters can be modeled and simulated in Simulink. These converters fall under three general classes: buck, boost, and buck-boost (buck steps down the input voltage, boost steps it up, and buck-boost inverts the polarity). Yi’s example covers all three classes (using their classic topologies). The core of this submission is a Simulink model that shows how the whole system works.

This model is simulating a full closed-loop system. The converter needs to hold its output voltage constant as the amount of current being drawn varies. The control loop (and the PWM PI controller) work to correct fluctuations in output voltage (usually due to fluctuations in output current). He's done some good work to create one customizable block that can be configured to simulate any desired topology (buck, boost, or buck-boost).

A really nice feature of this submission is the included tutorial (PDF) on deriving the mathematical equations used in the model from circuit diagrams. I highly recommend reading through this document. And if there was any doubt that the derivations were correct, my colleague Dick Benson created a circuit-level model of the same system to check the results.

The results from the circuit simulation were nearly identical to those from Yi’s model. The figure below shows two plots: the output voltage and inductor current. In each plot the result of the circuit simulation is overlaid on top of the output from Yi’s model. The two traces can hardly be distinguished.

If you would like to see some examples of these systems simulated using real circuit elements, download the Mixed-Signal Library from the bottom of this page, and go to the “Switching Power” examples. Note that the examples using circuit elements require SimPowerSystems . As always, your thoughts and comments here are greatly appreciated.

Get
the MATLAB code

Published with MATLAB® R2014a

The book *Digital Image Processing Using MATLAB* (DIPUM) contains 144 examples. I know this because I watch them all run in MATLAB at least a couple of times a year. I find watching all the images come and go during this process to be oddly entrancing.

I have used the MATLAB unit test framework to write automated tests that run every example script from the book to see whether each runs without error and without generating unexpected warning messages. I do this roughly twice a year to test the DIPUM examples against the next MATLAB and Image Processing Toolbox release that we are getting ready to ship. A couple of weeks ago I ran this using the latest development build of the upcoming R2014b release.

Note for those interested in advanced unit testing techniques: I have recently modified these tests to use the parameterized test feature introduced in R2014a. Parameterized tests make it easy to create a large set of mostly repetitive test cases that vary over a certain parameter space. Using this feature, I've written code that turns every example script in a specified folder into a separate test case that automatically executes the script and checks for errors and warnings. If you are interested in this unit testing capability, you might start with the documentation section called "Create Basic Parameterized Test."

I said above that I like to watch the images come and go. So I made a video of it for you. (I didn't want the video file to be huge, so I only captured a small portion of the screen.)

The video is just under 3 minutes long, and it's a complete waste of time. Enjoy.

http://blogs.mathworks.com/images/steve/2014/dipum_example_testing.mp4

]]>

The nineteenth Householder Symposium, Householder XIX, was held June 8-13 at Sol Cress, a conference center near Spa, Belgium. If you have been following either the web or the newletter edition of Cleve's Corner you know that the Gatlinburg/Householder series of conferences have played an important role in both my professional life and the history of MATLAB. I attended what turned out to be the third conference in the series, in Gatlinburg, Tennesse, when I was a graduate student in 1964. I have been to all 17 of the conferences that have been held since 1964. Here is a link to my News and Notes article about the Gatlinburg/Householder conferences.

The Householder Symposium has been organized over the years by an evolving volunteer committee. The symposium is not affiliated with any professional society. The organizing committee for Householder XIX was:

- Ilse Ipsen (chair), North Carolina State University, USA
- Jim Demmel, University of California at Berkeley, USA
- Alan Edelman, Massachusetts Institute of Technology, USA
- Heike Fassbender, Technical University of Braunschweig, Germany
- Volker Mehrmann, Technical University of Berlin
- Jim Nagy, Emory University, USA
- Yousef Saad, University of Minnesota, USA
- Valeria Simoncini, University of Bologna, Italy
- Zdenek Strakos, Charles University in Prague, Czech Republic
- Andy Wathen, Oxford University, UK

The local organizers in Belgium were:

- Paul Van Dooren (chair), Universite catholique de Louvain
- Pierre-Antoine Absil, Universite catholique de Louvain
- Karl Meerbergen, Katholieke Universiteit Leuven
- Annick Sartenaer, Universite de Namur
- Marc Van Barel, Katholieke Universiteit Leuven
- Sabine Van Huffel, Katholieke Universiteit Leuven

In order to facilitate interaction, attendance is limited. The organizing committee accepts applications and evaluates resumes. This time only about half of the 300 applicants were selected. This has been a controversial aspect of the conference, but I am in favor of limiting the number of participants.

A little over 100 of the attendees made it to the group photo. A few are wearing neckties and their best dresses in the photo because the conference banquet was scheduled shortly thereafter.

A list of the attendees is available on the symposium web site.

Everybody who attended the conference had an opportunity to give a talk or present a poster during the five day conference. There were a total of 26 half-hour plenary talks. Each day there were two sets of three parallel sessions of less formal 20 minute talks. A couple of days had late afternoon or evening poster sessions.

The complete program is available here. The overall theme is numerical linear algebra. Applications include partial differetial equations, image processing, control theory, graph theory, model reduction, and gyroscopes.

The Householder Prize is given for the best PhD dissertation written during the three-year period preceeding the meeting. A list of the winners of the prize in previous years is available here The committee judging the dissertations for the prize this time was:

- Volker Mehrmann, Technical University of Berlin, Germany (chair)
- Michele Benzi, Emory University, USA
- Inderjit Dhillon, UT Austin, USA
- Howard Elman, University of Maryland, USA
- Francoise Tisseur, University of Manchester, UK
- Stephen Vavasis, University of Waterloo, Canada

Volker reported that 17 theses from 9 countries were submitted. From these, a short list of six finalists were selected:

- Grey Ballard, UC Berkeley
- Saifon Chaturantabut, Rice University
- Cedric Effenberger, EPF Lausanne
- Nicolas Gillis, UC Louvain
- Agnieszka Miedlar, TU Berlin
- Yuji Nakatsukasa, UC Davis

From this short list, the two winners of the 2014 Householder Prize were announced:

- Nicolas Gillis, UC Louvain
- Yuji Nakatsukasa, UC Davis

Nicolas is now an Assistant Professor at Universite de Mons in Belgium and Yuji is an Assistant Professor at the University of Tokyo.

The two winners gave talks about their thesis work in the last days of the meeting. They shared prize money of a little over $2000 obtained by passing a hat at the banquet dinner during Householder XVIII three years ago in Lake Tahoe. A hat was passed again at this meeting to fund the next Householder Prize. The location for the next Householder meeting has not yet been decided.

Traditionally the Wednesday afternoon of the Householder meeting is devoted to an excursion of some kind. The XIX excursion was to "Notre Dame du Val-Dideu", an abbey founded in 1216 by Cistercian monks. There are no longer any monks living at the abbey -- it is now best known as a brewery. Their beer is Val-Dieu. We toured the abbey and the tiny brewery, which is operated by just five people. We then had the opportunity to taste three different beers and their wonderful cheese. Belgian abbey beers have nine percent alcohol. It was a very pleasant afternoon.

Get
the MATLAB code

Published with MATLAB® R2014a

Sean's pick this week is `mcolon` by Bruno Luong.

% Starting index startidx = [1 4 pi] % Stride between each start and ending index stride = [1 -1 pi] % Ending index endidx = [6 0 pi^2]

startidx = 1.0000 4.0000 3.1416 stride = 1.0000 -1.0000 3.1416 endidx = 6.0000 0 9.8696

```
% Desired result
disp(v)
```

1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 4.0000 3.0000 2.0000 1.0000 0 3.1416 6.2832 9.4248This can be accomplished with a simple

v = mcolon(startidx,stride,endidx).'; disp(v)

1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 4.0000 3.0000 2.0000 1.0000 0 3.1416 6.2832 9.4248

**Not all models linearize easily**

Most users who tried linearizing complex Simulink models know that it is often not an easy task. For a Simulink model to be realistic, it must have discontinuities like saturations, quantizations, ON/OFF controllers, etc. All those elements are known to linearize to zero, and consequently make it impossible to apply classical control design techniques.

When linearization fails, system identification can help.

**The Challenge**

To highlight this functionality, I picked a demo model which I thought would be challenging for the PID Tuner: sf_electrophydraulics: a PWM Driven Hydraulic Servomechanism.

This model already contains a PI controller doing a decent job, so I was curious to see if the PID Tuner would be able to do better.

As expected, I opened the dialog of the PID block, clicked the "Tune" button, and the PID Tuner app opened with a badge saying: *Plant cannot be linearized*.

**Identifying a plant from a simulation**

In the PID tuner, I clicked on the link to identify a plant

and I choose to obtain my data by simulating the model

Under the hood, the app will open the loop and replace the PID block by a source signal to excite the system. Since I know that the output of the controller is a duty cycle percentage, I specified that this input should be a step from 0% to 90%.

I clicked the Simulate button, and obtained the following data. As I would expect based on my knowledge of the system, a 90% duty cycle results in a motion of ~18mm.

I clicked the Apply and Close buttons to go back to the Plant Identification tab. Here I can try different structures for the plant model, and use the adjustors to manually tweak the plant model:

**Designing the Controller**

I saved the plant and went back to the PID Tuner tab. I cranked up the Response Time and Transient Behavior sliders to get something aggressive, and I clicked the Update Block button to apply the gain values to the block.

When I simulate the model with this tuned controller, I can only realize that the controller designed using the PID Tuner app is tracking the set point as good as the original controller.

**Now it's your turn**

I have to admit, I am really impressed by how easy it was to obtain a plant model and a controller for this model. Given all the complex discontinuities in this model, I was certain it would require more work to design a controller for it.

Give this new functionality a try, and let us know what you think by leaving a comment here.

]]>Today I’d like to introduce a fairly frequent guest blogger Sarah Wait Zaranek who works for the MATLAB Marketing team here at The MathWorks. She and I will be writing about the new capabilities for the webcam in R2014a.

- Webcam is Available
- Installing the Support Package
- Listing Webcams and Previewing
- Taking a Single Image
- Taking Images Within a Loop
- Taking Images Within a Loop and Saving to an AVI-file
- Set up video writer
- Grab and process frames
- Taking Images Within a Loop and Creating an Animated GIF
- Here's the movie we just made
- Tidy up
- Additional Camera Support
- Do You Have a Project that Uses a Webcam?

In R2014a, you can bring live images from webcams into MATLAB.

Webcam support is available through a hardware support package. Hardware support pacakges have existed in previous release for Simulink but now they are available for MATLAB, too.

You can find the support package installer in the resources section of the home tab of the Toolstrip. From there, choose install from Internet, select USB Webcams

And install your webcam support package.

Now, that the webcam support is installed – let’s get started using our webcam.

You can see a list of available webcams

webcamlist

ans = 'Microsoft LifeCam Cinema' 'Integrated Camera'

You can see that Loren has two different webcams. We use the function < `webcam`, to select which camera to use, by using either the camera name or the index in the `webcamlist` corresponding to the camera.

```
mycam = webcam('Microsoft LifeCam Cinema')
```

mycam = webcam with properties: Name: 'Microsoft LifeCam Cinema' Resolution: '640x480' AvailableResolutions: {1x12 cell} Brightness: 133 BacklightCompensation: 0 WhiteBalanceMode: 'auto' Saturation: 83 Zoom: 0 Pan: 0 FocusMode: 'auto' Sharpness: 25 WhiteBalance: 4500 ExposureMode: 'auto' Tilt: 0 Focus: 0 Contrast: 5 Exposure: -6

If you only have a single webcam available, that webcam will be used by default. You can use `preview` to check on the webcam view.

preview(mycam)

pause(10) snapnow closePreview(mycam)

You can experiment and set any properties that you may want to change. For example, you might want to change the resolution or the brightness.

mycam.Brightness = 200;

You can acquire a single live image from your webcam.

```
img = snapshot(mycam);
imagesc(img)
% You can see me and Loren hanging out in her office!
```

You can set up a loop to acquire many images – and can process each frame within the loop. For example, we can reduce the number of distinct colors used in the image.

Grab and process frames

frames = 50; for i = 1:frames % Acquire frame for processing img = snapshot(mycam); % Quantize image by thresholding idx = img > 60 & img < 170 ; img(idx)= 255; img(~idx)= 0; % Display frame imagesc(img); axis image; axis off; end

Additionally, you can create a video file of the processed frames by using `VideoWriter`.

```
mywriter = VideoWriter('mymovie.avi');
open(mywriter);
```

frames = 50; for ii = 1:frames % Acquire frame for processing img = snapshot(mycam); % Quantize image by thresholding idx = img > 60 & img < 170 ; img(idx)= 255; img(~idx)= 0; % Display frame imagesc(img); axis image; axis off; % Write frame to video writeVideo(mywriter,img); end close(mywriter)

We're going to do ALMOST the same thing we've just done, but output an animated GIF file instead.

Grab and process frames

frames = 50; filename = 'mymovie.gif'; for i = 1:frames % Acquire frame for processing img = snapshot(mycam); % Quantize image by thresholding idx = img > 60 & img < 170 ; img(idx)= 255; img(~idx)= 0; % Display frame imagesc(img) axis image; axis off; [img, cm] = rgb2ind(img, 256); if i == 1; imwrite(img,cm,filename,'gif','Loopcount',inf); else imwrite(img,cm,filename,'gif','WriteMode','append'); end end

delete(mycam)

Support for high-end scientific and industrial cameras and more advanced features such as triggering, data logging can be found in the Image Acquisition Toolbox.

Do you have a project, for school, work, or fun, where you want to use a webcam with MATLAB? Tell us about it here.

Get
the MATLAB code

Published with MATLAB® R2014a

**No, this hasn’t become a financial blog**

As a quick disclaimer; this post does not cover the financial challenges of introducing a new form of transportation. Instead, we’re focusing on modeling the dynamic effects of a banked turn. As commenters on previous posts have pointed out, the g-forces felt by passengers of the Hyperloop would be reduced by the effects of the vehicle banking within the tube.

I’d like to share how I modeled banking and what effect it had on previously shared results.

**Planar Dynamics**

I started with the assumption that there is always sufficient force from the air suspension to keep the craft floating above the tube floor. I split the air suspension forces into an uplift force directed toward the center of the tube and a stabilization force influencing the craft’s rotation.

To this system, I can add centripetal forces calculated from the curved path of the tube to obtain a good idea of the accelerations experienced by the passengers.

**SimMechanics Implementation**

One simple way to model this system is using a Revolute Joint in SimMechanics.

In the figure below, you can see a schematic representation of how the Revolute Joint is used. this includes:

- The centripetal acceleration calculated from the curved path is applied to the tube reference frame using the Time-Varying Gravity option of the Mechanism Configuration block.
- A torque is applied at the Revolute joint to control the banking and keep the vehicle in an optimal angle
- Constraint Forces are output from the Revolute Joint. Measuring these forces in the frame of the vehicle and dividing out the vehicle mass gives the horizontal and vertical g-forces felt by the passengers.

and in SimMechanics the implementation looks like this: (note the colors match the schematic above)

**The Results**

The Mechanics Explorer gives a visualization of the results. Here’s how the first minute looks:

I know my CAD renderings are embarrassingly basic. But, we’re working on something better for next time.

As expected, banking of the vehicle greatly reduces the lateral accelerations from the route curves. The plot below shows lateral (top) accelerations from static calculations (red) and dynamic results (blue).

The results indicate that it should be possible to tighten up some of the curves along the route. This should enable the Hyperloop to more closely follow the highway.

**Now it's your turn**

Retreive Matt's Hyperloop repository on GitHub and let us know what you think by leaving a comment here.

]]>Stiffness is a subtle concept that plays an important role in assessing the effectiveness of numerical methods for ordinary differential equations. (This article is adapted from section 7.9, "Stiffness", in *Numerical Computing with MATLAB*.)

Stiffness is a subtle, difficult, and important concept in the numerical solution of ordinary differential equations. It depends on the differential equation, the initial conditions, and the numerical method. Dictionary definitions of the word "stiff" involve terms like "not easily bent," "rigid," and "stubborn." We are concerned with a computational version of these properties.

A problem is stiff if the solution being sought varies slowly, but there are nearby solutions that vary rapidly, so the numerical method must take small steps to obtain satisfactory results.

Stiffness is an efficiency issue. If we weren't concerned with how much time a computation takes, we wouldn't be concerned about stiffness. Nonstiff methods can solve stiff problems; they just take a long time to do it.

Stiff ode solvers do more work per step, but take bigger steps. And we're not talking about modest differences here. For truly stiff problems, a stiff solver can be orders of magnitude more efficient, while still achieving a given accuracy.

A model of flame propagation provides an example. I learned about this example from Larry Shampine, one of the authors of the MATLAB ordinary differential equation suite. If you light a match, the ball of flame grows rapidly until it reaches a critical size. Then it remains at that size because the amount of oxygen being consumed by the combustion in the interior of the ball balances the amount available through the surface. The simple model is

$$ \dot{y} = y^2 - y^3 $$

$$ y(0) = \delta $$

$$ 0 \leq t \leq {2 \over \delta} $$.

The scalar variable $y(t)$ represents the radius of the ball. The $y^2$ and $y^3$ terms come from the surface area and the volume. The critical parameter is the initial radius, $\delta$, which is "small." We seek the solution over a length of time that is inversely proportional to $\delta$. You can see a dramatization by downloading `flame.m` from <http://www.mathworks.com/moler/ncmfilelist.html>.

At this point, I suggest that you start up MATLAB and actually run our examples. It is worthwhile to see them in action. I will start with `ode45`, the workhorse of the MATLAB ode suite. If $\delta$ is not very small, the problem is not very stiff. Try $\delta$ = 0.02 and request a relative error of $10^{-4}$.

```
delta = 0.02;
F = @(t,y) y^2 - y^3;
opts = odeset('RelTol',1.e-4);
ode45(F,[0 2/delta],delta,opts);
```

With no output arguments, `ode45` automatically plots the solution as it is computed. You should get a plot of a solution that starts at $y$ = 0.01, grows at a modestly increasing rate until $t$ approaches 50, which is $1/\delta$, then grows rapidly until it reaches a value close to 1, where it remains.

Now let's see stiffness in action. Decrease $\delta$ by more than two orders of magnitude. (If you run only one example, run this one.)

delta = 0.0001; ode45(F,[0 2/delta],delta,opts);

You should see something like this plot, although it will take several seconds to complete the plot. If you get tired of watching the agonizing progress, click the stop button in the lower left corner of the window. Turn on zoom, and use the mouse to explore the solution near where it first approaches steady state. You should see something like the detail obtained with this `axis` command.

axis([.995e4 1.03e4 0.9998 1.0002]) last_plot = getframe(gcf);

Notice that `ode45` is doing its job. It's keeping the solution within $10^{-4}$ of its nearly constant steady state value. But it certainly has to work hard to do it. If you want an even more dramatic demonstration of stiffness, decrease $\delta$ or decrease the tolerance to $10^{-5}$ or $10^{-6}$.

This problem is not stiff initially. It only becomes stiff as the solution approaches steady state. This is because the steady state solution is so "rigid." Any solution near $y(t) = 1$ increases or decreases rapidly toward that solution. (I should point out that "rapidly" here is with respect to an unusually long time scale.)

What can be done about stiff problems? You don't want to change the differential equation or the initial conditions, so you have to change the numerical method. Methods intended to solve stiff problems efficiently do more work per step, but can take much bigger steps. Stiff methods are *implicit*. At each step they use MATLAB matrix operations to solve a system of simultaneous linear equations that helps predict the evolution of the solution. For our flame example, the matrix is only 1 by 1, but even here, stiff methods do more work per step than nonstiff methods.

Let's compute the solution to our flame example again, this time with `ode23s`. The " *s* " in the name is for "stiff."

ode23s(F,[0 2/delta],delta,opts);

Here is the zoom detail.

axis([.995e4 1.03e4 0.9998 1.0002])

You can see that *ode23s* takes many fewer steps than *ode45*. This is actually an easy problem for a stiff solver. In fact, *ode23s* takes only 99 steps and uses just 412 function evaluations, while *ode45* takes 3040 steps and uses 20179 function evaluations. Stiffness even affects graphical output. The print files for the *ode45* figures are much larger than those for the *ode23s* figures.

Imagine you are returning from a hike in the mountains. You are in a narrow canyon with steep slopes on either side. An explicit algorithm would sample the local gradient to find the descent direction. But following the gradient on either side of the trail will send you bouncing back and forth across the canyon, as with #ode45#. You will eventually get home, but it will be long after dark before you arrive. An implicit algorithm would have you keep your eyes on the trail and anticipate where each step is taking you. It is well worth the extra concentration.

All numerical methods for stiff odes are *implicit*. The simplest example is the *backward Euler* method, which involves finding the unknown $v$ in

$$ v = y_n + h f(t_{n+1},v) $$

and then setting $y_{n+1}$ equal to that $v$. This is usually a nonlinear system of equations whose solution requires at least an approximation to the Jacobian, the matrix of partial derivatives

$$ J = {\partial F \over \partial y} $$

By default the partial derivatives in the Jacobian are computed by finite differences. This can be quite costly in terms of function evaluations. If a procedure for computing the Jacobian is available, it can be provided. Or, if the sparsity pattern is known, it can be specified. The blocks in a Simulink diagram, for example, are only sparsely connected to each other. Specifying a sparse Jacobian initiates sparse linear equation solving.

Newton's method for computing the $v$ in the backward Euler method is an iteration. Start perhaps with $v^0 = y_n$. Then, for $k = 0,1,...$, solve the linear system of equations

$$ (I - hJ) u^k = v^k - y_n - h f(t_{n+1},v^k) $$

for the correction $u^k$ . Set

$$ v^{k+1} = v^k + u^k $$

When you are satisfied that the $v^k$ have converged, let $y_{n+1}$ be the limit.

Stiff ODE solvers may not use Newton's method itself, but whatever method is used to find the solution, $y_{n+1}$, at the forward time step can ultimately be traced back to Newton's method.

`ode15s` employs two variants of a method that is quite different from the single step methods that I've described so far in this series on ode solvers. Linear multistep methods save solution values from several time steps and use all of them to advance to the next step.

Actually, `ode15s` can be compared to the other multistep method in the suite, `ode113`. One saves values of the solution, $y_n$, while the other saves values of the function, $F(t_n,y_n)$. One includes the unknown value at the new time step $y_{n+1}$ in the formulation, thereby making it implicit, while the other does not. Both methods can vary the order as well as the step size. As their names indicate, `ode15s` allows the order to vary between 1 and 5, while `ode113s` allows the order to vary between 1 and 13.

A property specified via `odeset` switches `ode15s` between two variants of a linear multistep method, BDF, Backward Differentiation Formulas, and NDF, Numerical Differentiation Formulas. BDFs were introduced for stiff odes in 1971 by Bill Gear. Gear's student, Linda Petzold, extended the ideas to DAEs, differential-algebraic equations, and produced DASSL, software whose successors are still in widespread use today. NDFs, which are the default for `ode15s`, include an additional term in the memory and consequently can take larger steps with the same accuracy, especially at lower order.

`ode23s` is a single step, implicit method described in the paper by Shampine and Reichelt referenced below. It uses a second order formula to advance the step and a third order formula to estimate the error. It recomputes the Jacobian with each step, thereby making it quite expensive in terms of function evaluations. If you can supply an analytic Jacobian then `ode23s` is a competitive choice.

`ode23t` and `ode23tb` are implicit methods based on the trapezoid rule and the second order BDF. The origins of the methods go back to the 1985 paper referenced below by a group at the old Bell Labs working on electronic device and circuit simulation. Mike Hosea and Larry Shampine made extensive modifications and improvements described in their 1996 paper when they implemented the methods in MATLAB.

Stiff ODE solvers are not actually using MATLAB's iconic backslash operator on a full system of linear equations, but they are using its component parts, LU decomposition and solution of the resulting triangular systems.

Let's look at the statistics generated by `ode23` when it solves the flame problem. We'll run it again, avoiding the plot by asking for output, but then ignoring the output, and just looking at the stats.

opts = odeset('stats','on','reltol',1.e-4); [~,~] = ode23s(F,[0 2/delta],delta,opts);

99 successful steps 7 failed attempts 412 function evaluations 99 partial derivatives 106 LU decompositions 318 solutions of linear systems

We see that at every step `ode23s` is computing a Jacobian, finding the LU decomposition of a matrix involving that Jacobian, and then using L and U to solve three linear systems.

Now how about the primary stiff solver, `ode15s`.

[~,~] = ode15s(F,[0 2/delta],delta,opts);

140 successful steps 39 failed attempts 347 function evaluations 2 partial derivatives 53 LU decompositions 342 solutions of linear systems

We see that `ode15s` takes more steps than `ode23s`, but requires only two Jacobians. It does only half as many LU decompositions, but then uses each LU decomposition for twice as many linear equation solutions.

We certainly can't draw any conclusions about the relative merits of these two solvers from this one example, especially since the Jacobian in this case is only a 1-by-1 matrix.

Cleve Moler, *Numerical Computing with MATLAB*, Electronic Edition, MathWorks, <http://www.mathworks.com/moler/index_ncm.html>,

Print Edition, SIAM Revised Reprint, SIAM, 2008, 334 pp., <http://www.ec-securehost.com/SIAM/ot87.html>.

Lawrence F. Shampine and Mark W. Reichelt, "The MATLAB ODE Suite", SIAM Journal on Scientific Computing, 18 (1997), pp.1-22, <http://www.mathworks.com/help/pdf_doc/otherdocs/ode_suite.pdf>

Lawrence F. Shampine, Mark W. Reichelt, and Jacek A. Kierzenka, "Solving Index-1 DAEs in MATLAB and Simulink", SIAM Review, 41 (1999), pp. 538-552. <http://epubs.siam.org/doi/abs/10.1137/S003614459933425X>

M. E. Hosea and L. F. Shampine, "Analysis and Implementawtion of TR-BDF2", Applied Numerical Mathematicss, 20 (1996), pp. 21-37, <http://www.sciencedirect.com/science/article/pii/0168927495001158>

R. E. Bank, W. M. Coughran, Jr., W. Fichtner., E. H. Grosse, D. J. Rose, and R. K. Smith, "Transient simulation of silicon devices and circuits", IEEE Transactions on Computer-Aided Design CAD-4 (1985), 4, pp. 436-451.

I want to repeat this plot because it represents the post on the lead-in page at MATLAB Central.

imshow(last_plot.cdata,'border','tight')

Get
the MATLAB code

Published with MATLAB® R2014a

*by Gareth Thomas*

Today I want to talk about sharing tables in reports (a table is a relatively new data type in MATLAB suitable for column-oriented or tabular data). In the current release of MATLAB (R2014a) it can be a little tricky to represent a table variable in an HTML document. So I’m going to show you how to take the reporting capabilities that MATLAB currently offers, combine them with some HTML 5 and JavaScript, and end up with a beautiful report that functions nicely as a standalone document.

Today there are basically two different ways to create HTML reports from MATLAB.

- MATLAB’s publish command
- The MATLAB Report Generator

The publish command ships with MATLAB, while the Report Generator is a product dedicated to the creation of sophisticated reports from MATLAB. Beyond this, the key difference between the two methods really comes down to when you want to deploy your algorithm. That is, do you want the report to update itself in real time? If you want to create a standalone executable, you can really only use the Report Generator.

I’m going to focus on MATLAB’s publish command. Suppose you want to pass a lot of MATLAB-generated HTML into your published document. It turns out that you can use the disp command to add that HTML code into the generated document. However it is not really easy to pass arguments, as this HTML is hard coded at publish time. To get around this I created a function (makeHTMLTableFilter) that creates HTML code that, when published, is also dynamic and responsive. It achieves this by adding Javascript code. This allows you to do things like filter the table interactively.

See the example below:

This is a very nice way of sharing your results in a form of a table while allowing others to filter the fields that they wish.

One question that I get is, “Okay, but how would I deploy this?” As it happens, this idea of using

`disp(makeHTMLTableFilter(mytable))`

is also supported in the MATLAB Report Generator.

So download makeHTMLTableFilter and give it a try. You’ll soon be making interactive reports of your own.

]]>Whatever your opinion of social media these days, there is no denying it is now an integral part of our digital life. So much so, that social media metrics are now considered part of altmetrics, an alternative to the established metrics such as citations to measure the impact of scientific papers.

Today's guest blogger, Toshi, will show you how to access the Twitter API and analyze tweets with MATLAB.

- Why Twitter
- Sentiment Analysis
- Tweet Content Visualization
- Who Tweeted the News?
- Does Follower Count Really Matter? Going Viral on Twitter
- Visualizing the Retweet Social Graph
- Getting Started with Twitter using Twitty
- Processing Tweets and Scoring Sentiments
- Processing Tweets for Content Visualization
- Get the Profile of Top 5 Users
- Streaming API for High Volume Real Time Tweets
- Save an Edge List for Social Graph Visualization
- Closing

Twitter is a good starting point for social media analysis because people openly share their opinions to the general public. This is very different from Facebook, where social interactions are often private. In this post, I would like to share simple examples of sentiment analysis and social graph visualization using Twitter's Search and Streaming APIs.

The first part of this post discusses analysis with Twitter, and the latter part shows the code that computes and creates plots, like those shown earlier.

One of the very common analyses you can perform on a large number of tweets is sentiment analysis. Sentiment is scored based on the words contained in a tweet. If you manage a brand or political campaign, for example, it may be important to keep track of your popularity, and sentiment analysis provides a convenient way to take the pulse of the tweeting public. Here is an example of sentiment analysis between Amazon and Hachette as of this writing, based on 100 tweets collected via the Twitter Search API.

The sentiment distributions are nearly identical between the two brands, but you can see that tweets mentioning both have clearly skewed to the negative, since the news is about a war between Amazon and a publisher over ebook profit margin. Is there a single metric we can use to make this comparison easier? That's where *Net Sentiment Rate (NSR)* comes in.

NSR = (Positive Tweets-Negative Tweets)/Total

Here is the result. You could keep taking this measurement periodically for ongoing sentiment monitoring, if interested. Perhaps you may discover that NSR is correlated to their stock prices!

Amazon NSR : 0.84 Hachette NSR: 0.58 Both NSR : -0.30

And lastly, but not in the least, did sentiment scoring actually work? Check out the top 5 positive and negative tweets for Hachette for your own assessment.

Top 5 positive tweets ___________________________________________________________________________

'@deckchairs @OccupyMyCat @aworkinglibrary but I think Hachette artists...' '@emzleb Hachette has Rowling so they hold a lot of cards (A LOT of car...' 'Amazon Confirms Hachette Spat Is To "Get a Better Deal" http://t.co/Ka...' '@shaunduke @DarkMatterzine Yeah, Gollancz is owned by Orion Publishing...' 'MUST READ Book publisher Hachette says working to resolve Amazon dispu...'

Top 5 negative tweets ___________________________________________________________________________

'Reading into the Amazon vs. Hachette battle - May 28 - The war between...' '#Vtech Reading into the Amazon vs. Hachette battle - May 28 - The war ...' '#Vbnss Reading into the Amazon vs. Hachette battle - May 28 - The war ...' 'RT @text_publishing: Amazon war with Hachette over ebook profit margin...' 'RT @text_publishing: Amazon war with Hachette over ebook profit margin...'

What were the main themes they tweeted about when those users mentioned both Amazon and Hachette? The word count plot shows that mostly those tweets repeated the news headlines like “Amazon admits dispute (with) Hachette”, perhaps with some commentary - showing that Twitter was being used for news amplification.

The 100 tweets collected came from 86 users. So on average each user only tweeted 1.16 times. Instead of frequency, let's find out who has a large number of followers (an indicator that they may be influential) and check their profile. It appears that 2 or 3 out of the 5 top users (based on follower count) are writers, and others are news syndication services.

Name Followers Description ________________ _________ ____________________________________________________

'Daton L Fluker' 73578 '#Horror #Novelist of Death Keeper's Biological Wast...' 'WellbeingVigor' 22224 'Writer - 10 years .here, Incurable music enthusiast #' 'E-Book Update' 10870 '' 'Michael Rosa' 10297 '' 'Net Tech News' 7487 'Latest internet and technology news headlines from ...'

In the previous section, we checked out the top 5 users based on their follower count. The assumption was that, if you have a large number of followers, you are considered more influential because more people may see your tweets.

Now let's test this assumption. For that I need more than 100 tweets. So I collected a new batch of data - 1000 tweets from 4 trending topics from the UK, and plotted the users based on their follower counts vs. how often their tweets got retweeted. The size (and the color) of the bubbles show how often those users tweeted.

It looks like you do need some base number of followers to make it to the national level, but the correlation between the follower counts to the frequency of getting retweeted looks weak. Those charts look like different stages of viral diffusion - the top two charts clearly show one user broke away from the rest of the crowd, and in that process they may have also gained more followers. The bottom two charts show a number of users competing for attention but no one has a clear breakout yet. If this was an animation, it may look like boiling water. Is anyone interested in analyzing whether this is indeed how a tweet goes viral?

Retweeting of one user's tweet by others creates a network of relationships that can be represented as a social graph. We can visualize such relationship with a popular social networking analysis tool Gephi.

"I Can't Sing" Social Graph Larger

"#InABlackHousehold" Social Graph Larger

You can see that, in the first case, two users formed large clusters of people retweeting their tweets, and everyone else was dwarfed. In the second case, we also see two dominant users, but they have not yet formed a large scale cluster.

Now that you have seen a simple analysis I did with Twitter, it is time to share how I did it in MATLAB. To get started with Twitter, you need to get your developer credentials. You also need Twitty by Vladimir Bondarenko. It is simple to use and comes with excellent documentation.

- Create a Twitter account if you do not already have one
- Create a Twitter app to obtain developer credentials
- Download and install Twitty from the FileExchange, along with the JSON Parser and optionally JSONLab
- Create a structure array to store your credentials for Twitty

Let's search for tweets that mention `'amazon'` and `'hachette'`.

% a sample structure array to store the credentials creds = struct('ConsumerKey','your-consumer-key-here',... 'ConsumerSecret','your-consumer-secret-here',... 'AccessToken','your-token-here',... 'AccessTokenSecret','your-token-secret-here'); % set up a Twitty object addpath twitty_1.1.1; % Twitty addpath parse_json; % Twitty's default json parser addpath jsonlab; % I prefer JSONlab, however. load('creds.mat') % load my real credentials tw = twitty(creds); % instantiate a Twitty object tw.jsonParser = @loadjson; % specify JSONlab as json parser % search for English tweets that mention 'amazon' and 'hachette' amazon = tw.search('amazon','count',100,'include_entities','true','lang','en'); hachette = tw.search('hachette','count',100,'include_entities','true','lang','en'); both = tw.search('amazon hachette','count',100,'include_entities','true','lang','en');

Twitty stores tweets in structure array created from the API response in JSON format. I prefer using a table when it comes to working with heterogeneous data containing a mix of numbers and text. I wrote some code, `processTweets`, to convert structure arrays into tables and compute sentiment scores. You can find the Amazon-Hachette data file here.

For sentiment analysis, I used AFINN, along with a list of English stop words so that we don't count frequent common words like "a" or "the".

% load supporting data for text processing scoreFile = 'AFINN/AFINN-111.txt'; stopwordsURL ='http://www.textfixer.com/resources/common-english-words.txt'; % load previously saved data load amazonHachette.mat % process the structure array with a utility method |extract| [amazonUsers,amazonTweets] = processTweets.extract(amazon); % compute the sentiment scores with |scoreSentiment| amazonTweets.Sentiment = processTweets.scoreSentiment(amazonTweets, ... scoreFile,stopwordsURL); % repeat the process for hachette [hachetteUsers,hachetteTweets] = processTweets.extract(hachette); hachetteTweets.Sentiment = processTweets.scoreSentiment(hachetteTweets, ... scoreFile,stopwordsURL); % repeat the process for tweets containing both [bothUsers,bothTweets] = processTweets.extract(both); bothTweets.Sentiment = processTweets.scoreSentiment(bothTweets, ... scoreFile,stopwordsURL); % calculate and print NSRs amazonNSR = (sum(amazonTweets.Sentiment>=0) ... -sum(amazonTweets.Sentiment<0)) ... /height(amazonTweets); hachetteNSR = (sum(hachetteTweets.Sentiment>=0) ... -sum(hachetteTweets.Sentiment<0)) ... /height(hachetteTweets); bothNSR = (sum(bothTweets.Sentiment>=0) ... -sum(bothTweets.Sentiment<0)) ... /height(bothTweets); fprintf('Amazon NSR : %.2f\n',amazonNSR) fprintf('Hachette NSR: %.2f\n',hachetteNSR) fprintf('Both NSR : %.2f\n\n',bothNSR) % plot the sentiment histogram of two brands binranges = min([amazonTweets.Sentiment; ... hachetteTweets.Sentiment; ... bothTweets.Sentiment]): ... max([amazonTweets.Sentiment; ... hachetteTweets.Sentiment; ... bothTweets.Sentiment]); bincounts = [histc(amazonTweets.Sentiment,binranges)... histc(hachetteTweets.Sentiment,binranges)... histc(bothTweets.Sentiment,binranges)]; figure bar(binranges,bincounts,'hist') legend('Amazon','Hachette','Both','Location','Best') title('Sentiment Distribution of 100 Tweets') xlabel('Sentiment Score') ylabel('# Tweets')

Amazon NSR : 0.84 Hachette NSR: 0.58 Both NSR : -0.30

`processTweets` also has a function `tokenize` that parses tweets to calculate the word count.

% tokenize tweets with |tokenize| method of |processTweets| [words, dict] = processTweets.tokenize(bothTweets,stopwordsURL); % create a dictionary of unique words dict = unique(dict); % create a word count matrix [~,tdf] = processTweets.getTFIDF(words,dict); % plot the word count figure plot(1:length(dict),sum(tdf),'b.') xlabel('Word Indices') ylabel('Word Count') title('Words contained in the tweets') % annotate high frequency words annotated = find(sum(tdf)>= 10); jitter = 6*rand(1,length(annotated))-3; for i = 1:length(annotated) text(annotated(i)+3, ... sum(tdf(:,annotated(i)))+jitter(i),dict{annotated(i)}) end

Twitty also supports the 'users/show' API to retrieve user profile information. Let's get the profile of the top 5 users based on the follower count.

% sort the user table by follower count in descending order [~,order] = sortrows(bothUsers,'Followers','descend'); % select top 5 users top5users = bothUsers(order(1:5),[3,1,5]); % add a column to store the profile top5users.Description = repmat({''},height(top5users),1); % retrieve user profile for each user for i = 1:5 userInfo = tw.usersShow('user_id', top5users.Id(i)); if ~isempty(userInfo{1}.description) top5users.Description{i} = userInfo{1}.description; end end % print the result disp(top5users(:,2:end))

Name Followers ________________ _________ 'Daton L Fluker' 73578 'WellbeingVigor' 22224 'E-Book Update' 10870 'Michael Rosa' 10297 'Net Tech News' 7487 Description ___________________________________________________________________________ '#Horror #Novelist of Death Keeper's Biological Wasteland, Finished Cri...' 'Writer - 10 years .here, Incurable music enthusiast #' '' '' 'Latest internet and technology news headlines from news sources around...'

If you need more than 100 tweets to work with, then your only option is to use the Streaming API which provides access to the sampled Twitter fire hose in real time. That also means you need to access the tweets that are currently active. You typically start with a trending topic from a specific location.

You get local trends by specifying the geography with WOEID (Where On Earth ID), available at WOEID Lookup.

uk_woeid = '23424975'; % UK uk_trends = tw.trendsPlace(uk_woeid); uk_trends = cellfun(@(x) x.name, uk_trends{1}.trends, 'UniformOutput',false)';

Once you have the current trends (or download them from here), you can use the Streaming API to retrieve the tweets that mention the trending topic. When you specify an output function with Twitty, the data is store within Twitty. Twitty will process incoming tweets up to the sample size specified, and process data by the batch size specified.

tw.outFcn = @saveTweets; % output function tw.sampleSize = 1000; % default 1000 tw.batchSize = 1; % default 20 tic; tw.filterStatuses('track',uk_trends{1}); % Streaming API call toc uk_trend_data = tw.data; % save the data

% reload the previously saved search result for 4 trending topics in the UK load('uk_data.mat') % plot figure for i = 1:4 % process tweets [users,tweets] = processTweets.extract(uk_data(i).statuses); % get who are mentioned in retweets retweeted = tweets.Mentions(tweets.isRT); retweeted = retweeted(~cellfun('isempty',retweeted)); [screen_names,~,idx] = unique(retweeted); count = accumarray(idx,1); retweeted = table(screen_names,count,'VariableNames',{'Screen_Name','Count'}); % get the users who were mentioned in retweets match = ismember(users.Screen_Name,retweeted.Screen_Name); retweetedUsers = sortrows(users(match,:),'Screen_Name'); match = ismember(retweeted.Screen_Name,retweetedUsers.Screen_Name); retweetedUsers.Retweeted_Count = retweeted.Count(match); [~,order] = sortrows(retweetedUsers,'Retweeted_Count','descend'); % plot each topic subplot(2,2,i) scatter(retweetedUsers.Followers(order),... retweetedUsers.Retweeted_Count(order),retweetedUsers.Freq(order)*50,... retweetedUsers.Freq(order),'fill') if ismember(i, [1,2]) ylim([-20,90]); xpos = 2; ypos1 = 50; ypos2 = 40; elseif i == 3 ylim([-1,7]) xlabel('Follower Count (Log Scale)') xpos = 1010; ypos1 = 0; ypos2 = -1; else ylim([-5,23]) xlabel('Follower Count (Log Scale)') xpos = 110; ypos1 = 20; ypos2 = 17; end % set x axis to log scale set(gca, 'XScale', 'log') if ismember(i, [1,3]) ylabel('Retweeted Count') end title(sprintf('UK Tweets for: "%s"',uk_data(i).query.name)) end

Gephi imports an edge list in CSV format. I added a new method `saveEdgeList` to `processTweet` that saves the screen names of the users as `source` and the hashtags and screen names they mention in their tweets as `target` in a <https://gephi.org/users/supported-graph-formats/csv-format/ Gephi-ready CSV file.

```
processTweets.saveEdgeList(uk_data(1).statuses,'edgeList.csv');
```

File "edgeList.csv" was successfully saved.

It is quite easy to get started with Twitter Analytics with MATLAB and hopefully you got the taste of what kind of analyses are possible.

We only scratched the surface. Twitter offers many of the most interesting opportunities for data analytics. How would you use Twitter Analytics? Check out some examples from this search result from PLOS ONE that list various papers that used Twitter for their study. Tell us about your Twitty experiences here.

Get
the MATLAB code

Published with MATLAB® R2014a

The TV show "The Americans" is a show about Soviet Union secret agents pretending to be a normal American couple in 1981-82 or so. In the 2nd season's final episode, which I watched last night, a military officer shows an FBI agent a computer screen that's supposed to be showing some code from a top-secret computer program called Echo. (Don't worry; this is not a significant spoiler.) When the code flashed by I thought it looked odd, more modern than I would have expected for an early 80s program. So I paused the show and took a closer look. I was very amused to see that the top half of the screen contains MATLAB code of a type that did not exist until about 20 years after the show's time period.

The MATLAB code was clearly generated using GUIDE, the MATLAB GUI Design Environment. (Click on the thumbnail above for an enlarged view in which the code is readable.)

When I showed this to my MathWorker friend Jason, he did a little investigating and discovered the code's origin. It's from an 11-year-old File Exchange contribution called MATLAB Simulations for Radar Systems Design by Bassem Mahafza.

So congratulations, Bassem! Your old MATLAB code is now famous (sort of).

]]>I gave it a try with the new LEGO MINDSTORMS EV3 Support Package, and I was really impressed by how easy it is to use.

Let's see how it works.

**Acquiring Data**

To identify a plant model, your first need experimental data. In my case, I created the following simple model that applies power to a motor and measures the resulting motion.

Using External Mode and To Workspace blocks, I acquire experimental data that will be used later.

**Control Model**

I want to control the motor in position, so I create the following model:

I open the dialog of the Discrete PID block, I specify the type of controller I want, its sample time, and I click the Tune button to launch the PID Tuner app.

**Identifying the plant**

As first guess, the PID Tuner tries to obtain a plant model by linearizing the blocks present in the model. Obviously, in this case this will not work because the Simulink model does not contain a plant.

To identify a plant based on the data previously acquired, select the *Identify New Plant* option:

A Plant Identification tab will appear where you will be able to import the previously saved data.

Theoretically, a motor like this one should be modeled using a second-order transfer function, with integrator (See this tutorial for an example). However, based on past experience I know that the effect of the second pole is almost negligible for this Lego motor. In the Plant Identification tab I specify a one pole system with integrator and click the **Auto Estimate** button. As you can see, this does a pretty good job. I zoomed in on one of the transitions to show that the plant output is not exactly identical to the experimental data... which would be impossible.

**Tuning the Controller**

Once you are satisfied with the identified plant, click the **Save Plant** button and switch to the PID Tuner tab. Play with the sliders until you get a satisfying response, and once this is done click **Update Block** to apply the tuned gains to the controller block in your model.

**Deploying the Controller**

I tried running the control model on the EV3 and I was really impressed to see that the results were almost identical to what I could see in the PID Tuner:

**Now it's your turn**

This shows only a small part of what this app can do. Here are a few additional resources for you to learn more:

- For a more detailed explanation of this workflow, watch the PID Controller Tuning Based on Measured Input-Output Data video.
- To see how this workflow can help if you are designing PID controller for a Simulink model with discontinuities such as MOSFETs and PWMs, watch the PID Controller Tuning for a Model with Discontinuities video.
- If you are interested in a more comprehensive overview of PID control design with MATLAB and Simulink, watch PID Control Made Easy webinar.

Try the PID Tuner app with integrated system identification capabilities for yourself and let us know what you think by leaving a comment here.

]]>Here’s a quick example to show you how it works. Let’s suppose you have some data. Say, the per capita consumption of margarine in the US (in pounds). Mmmm. Margarine.

`yrs = 2000:2009;`

margarineConsumption = [8.2 7 6.5 5.3 5.2 4 4.6 4.5 4.2 3.7];

plot(yrs,margarineConsumption)

As soon as you see the margarine data, you think to yourself “Wait a second! Where have I seen that trend before?” A moment of reflection, and suddenly it hits you: Maine divorce rates! That margarine plot is a dead ringer for divorce rates in Maine. Let’s do a plotyy.

`maineDivorceRate = [5 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1];`

plotyy(yrs,margarineConsumption,yrs,maineDivorceRate)

Sure enough, the fit is remarkable*. Now let’s say we want to share our plot as an interactive web page. After adding the Plotly API code to your path, just type

`fig2plotly`

and you get a link to a web page like this:

Cool! Follow the link to try out the interaction.

* The remarkable margarine-divorce correlation was unearthed by Tyler Vigen on his fascinating Spurious Correlations page.

]]>The functions `ode23` and `ode45` are the principal MATLAB and Simulink tools for solving nonstiff ordinary differential equations.

The two functions `ode23` and `ode45` are single step ODE solvers. They are also known as Runge-Kutta methods. Each step is almost independent of the previous steps. Two important pieces of information are passed from one step to the next. The step size $h$ expected to achieve a desired accuracy is passed from step to step. And, in a strategy known as FSAL, for First Same as Last, the final function value at the end of a successful step is used at the initial function value at the following step.

The BS23 algorithm is due to Larry Shampine and his student Przemyslaw Bogacki. Przemyslaw is now a Professor at Old Dominion University. The " *23* " in the function name indicates that two simultaneous single-step formulas, one of second order and one of third order, are involved.

The method has three stages, but there are four slopes $s_i$ because, after the first step, the $s_1$ for one step is the $s_4$ from the previous step. The essentials are

$$ s_1 = f(t_n,y_n) $$

$$ s_2 = f\left(t_n+{h \over 2},y_n+{h \over 2} s_1\right) $$

$$ s_3 = f\left(t_n+{3 \over 4} h, y_n+{3 \over 4} h s_2\right) $$

$$ t_{n+1} = t_n + h $$

$$ y_{n+1} = y_n + {h \over 9} (2 s_1 + 3 s_2 + 4 s_3) $$

$$ s_4 = f(t_{n+1},y_{n+1}) $$

$$ e_{n+1} = {h \over 72} (-5 s_1 + 6 s_2 + 8 s_3 - 9 s_4) $$

Here is a graphical view of the steps.

ode23steps

These graphics show the starting situation and the three stages. We start at a point $(t_n,y_n)$ with an initial slope $s_1 = f(t_n,y_n)$ and an estimate of a good step size, $h$. Our goal is to compute an approximate solution $y_{n+1}$ at $t_{n+1} = t_n+h$ that agrees with the true solution $y(t_{n+1})$ to within the specified tolerances.

The first stage uses the initial slope $s_1$ to take an Euler step halfway across the interval. The function is evaluated there to get the second slope, $s_2$. This slope is used to take an Euler step three-quarters of the way across the interval. The function is evaluated again to get the third slope, $s_3$. A weighted average of the three slopes is used to step all the way across the interval to get a tentative value for $y_{n+1}$.

$$ s = {1 \over 9} (2 s_1 + 3 s_2 + 4 s_3) $$

The function is evaluated once more to get $s_4$. The error estimate then uses all four slopes.

$$ e_{n+1} = {h \over 72} (-5 s_1 + 6 s_2 + 8 s_3 - 9 s_4) $$

If the error is within the specified tolerance, the step is successful, the tentative value of $y_{n+1}$ is accepted, and $s_4$ becomes the $s_1$ of the next step. If the error is too large, then the tentative $y_{n+1}$ is rejected and the step must be redone. In either case, the error estimate $e_{n+1}$ provides the basis for determining the step size $h$ for the next step.

Notice that $y_{n+1}$ is computed before $s_4$. The final function evaluation is used for the error estimate, but not for the step itself. But this $s_4$ is the $s_1$ in the next step.

The coefficients in the error estimate $e_{n+1}$ result from the difference between the third order formula that is used to compute $y_{n+1}$ and an independent second order formula that involves the same function values. That second order result is not actually computed because its value is not needed.

All the solvers in the suite provide an interpolant that can generate values approximating the solution to the differential equation to the desired accuracy anywhere in the interval without requiring further evaluation of the function defining the ode. In the case of `ode23` this interpolant happens to be "free". In turns out that the Hermite cubic polynomial defined by the four values

$$ y_n, \ F(t_n,y_n), \ y_{n+1}, \ F(t_{n+1}, \ y_{n+1}) $$

does the job. For other solvers in the suite, providing the accompanying interpolant is an important aspect of the algorithm derivation.

Before today's version of `ode45`, there was an earlier one. In a 1969 NASA report, Erwin Fehlberg introduced a so-called six stage Runge-Kutta method that requires six function evaluations per step. These function values can be combined with one set of coefficients to produce a fifth-order accurate approximation and with another set of coefficients to produce an independent fourth-order accurate approximation. Comparing these two approximations provides an error estimate and resulting step size control.

Notice that it takes six stages to get fifth order. It is not possible to get fifth order with only five function evaluations per step. The combinatorial complexity of the Taylor series in two variables for $F(t,y)$ overpowers the information available from the function evaluations. In fact, if you continue to investigate the development of Runge-Kutta methods, you will find that, for example, with ten stages it is only possible to achieve seventh order.

In the early 1970's Shampine and his colleague H. A. (Buddy) Watts at Sandia Laboratories published a Fortran code, RKF45, based on Fehlberg's algorithm. In 1977, we made RKF45 the ODE solver in our text book *Computer Methods for Mathematical Computations*, by Forsythe, Malcolm and Moler. This book is now out of print so I can't provide a decent link to it. But the Fortran source code for RKF45 is still available from netlib.

One thing that I will always remember about RKF45 is ${12 \over 13}$ The fourth function evaluation, $s_4$, is made at the point

$$ t_n + {12 \over 13} h $$

I doubt that you will ever come across ${12 \over 13}$ anyplace else in mathematical software.

RKF45 became the basis for the first version of ODE45 in MATLAB in the early 1980s and for early versions of Simulink. The Felhberg (4,5) pair did a terrific job for almost fifteen years until the late 1990s when Shampine and MathWorker Mark Reichelt modernized the suite and introduced a more efficient algorithm.

The new `ode45` introduced in the late 1990s is based on an algorithm of Dormand and Prince. It uses six stages, employs the FSAL strategy, provides fourth and fifth order formulas, has local extrapolation and a companion interpolant.

The new `ode45` is so effective that it is the only function in the suite where the default value of the `refine` argument is set to 4. This means that the step size the algorithm naturally wants to choose is so large that the output is more widely spaced than most people prefer. So the interpolant is called up to produce more finely spaced output at no additional cost measured in function evaluations.

The codes for the two routines `ode23` and `ode45` are very similar. The most important place they differ is the portion that sets the key parameters. Here are the parameters in `ode23`.

dbtype 200:209 ode23

200 % Initialize method parameters. 201 pow = 1/3; 202 A = [1/2, 3/4, 1]; 203 B = [ 204 1/2 0 2/9 205 0 3/4 1/3 206 0 0 4/9 207 0 0 0 208 ]; 209 E = [-5/72; 1/12; 1/9; -1/8];

The parameter `pow` is used in the step size calculation. We see that `ode23` is a third order method. The array `A` gives the fractions for each partial step. The array `B` provides the weights for the slopes. And the array `E` provides the coefficients in the error estimate. Here are the corresponding parameters for the Dormand-Prince algorithm used in `ode45`.

dbtype 201:213 ode45

201 % Initialize method parameters. 202 pow = 1/5; 203 A = [1/5, 3/10, 4/5, 8/9, 1, 1]; 204 B = [ 205 1/5 3/40 44/45 19372/6561 9017/3168 35/384 206 0 9/40 -56/15 -25360/2187 -355/33 0 207 0 0 32/9 64448/6561 46732/5247 500/1113 208 0 0 0 -212/729 49/176 125/192 209 0 0 0 0 -5103/18656 -2187/6784 210 0 0 0 0 0 11/84 211 0 0 0 0 0 0 212 ]; 213 E = [71/57600; 0; -71/16695; 71/1920; -17253/339200; 22/525; -1/40];

`ode23` is a three-stage, third-order, Runge-Kutta method. `ode45` is a six-stage, fifth-order, Runge-Kutta method. `ode45` does more work per step than `ode23`, but can take much larger steps. For differential equations with smooth solutions, `ode45` is often more accurate than `ode23`. In fact, it may be so accurate that the interpolant is required to provide the desired resolution. That's a good thing.

`ode45` is the anchor of the differential equation suite. The MATLAB documentation recommends `ode45` as the first choice. And Simulink blocks set `ode45` as the default solver. But I have a fondness for `ode23`. I like its simplicity. I particularly like it for graphics. The natural step size that `ode23` chooses is frequently just right for display purposes. For example, the plot of the Lorenz chaotic attractor at the end of my previous post is done with `ode23` choosing the step size.

So give `ode23` a try.

Get
the MATLAB code

Published with MATLAB® R2014a

I'd like to introduce this week's guest blogger Alan Weiss. Alan writes documentation for mathematical toolboxes here at MathWorks.

Do you use `GlobalSearch` or `MultiStart` for finding multiple local solutions to your optimization problems? Both of these solvers can report the locations and objective function values of the local solutions they find, as well as the starting points that led to each local solution.

Sometimes, though, the solvers report too many local solutions. Their definition of what constitutes "distinct" solutions can differ from yours. This article shows the problem, and two solutions. One solution involves setting `GlobalSearch` or `MultiStart` tolerances. But this can be difficult if you don't know beforehand how close solutions can be. The other involves the `clumpthem` function; download it here.

For example, look at the six-hump camelback function described in the documentation. Get the `sixhumps` code.

sixmin = sixhumps

sixmin = @(x)(4*x(:,1).^2-2.1*x(:,1).^4+x(:,1).^6/3+x(:,1).*x(:,2)-4*x(:,2).^2+4*x(:,2).^4)

You can see in the contour plot that there are six local minima. The colored asterisks mark the minima.

Take a look at the number of local minima reported by running `MultiStart` with the `fmincon` local solver and active-set algorithm.

ms = MultiStart; opts = optimoptions(@fmincon,'Algorithm','active-set'); problem = createOptimProblem('fmincon','x0',[-1,2],... 'objective',sixmin,'lb',[-3,-3],'ub',[3,3],... 'options',opts); rng default % for reproducibility [xminm,fminm,flagm,outptm,manyminsm] = run(ms,problem,50); % run fmincon 50 times size(manyminsm)

MultiStart completed the runs from all start points. All 50 local solver runs converged with a positive local solver exit flag. ans = 1 50

How could there be 50 separate minima reported when you know that there are only six points that are local minima? The answer is that many of these minima are close to each other. For example, the first three points look the same.

manyminsm(1).X manyminsm(2).X manyminsm(3).X

ans = -0.089843 0.71266 ans = -0.089844 0.71265 ans = -0.089834 0.71265

Of course, these reported minima aren't really the same.

norm(manyminsm(1).X - manyminsm(2).X)

ans = 7.6691e-06

The point is, its default tolerances cause `MultiStart` to report different minima if they differ by more than 1e-6 in value or position. And these minima are more than 1e-6 apart.

You can loosen the default `MultiStart` tolerances to have `MultiStart` automatically combine similar minima.

rng default % for reproducibility ms.TolX = 1e-3; ms.TolFun = 1e-4; [xminloose,fminloose,flagloose,outptloose,manyminsloose] = run(ms,problem,50); size(manyminsloose)

MultiStart completed the runs from all start points. All 50 local solver runs converged with a positive local solver exit flag. ans = 1 6

Loosening the tolerances caused `MultiStart` to give just the six truly different local minima. But this process involved rerunning the solver.

Wouldn't it be better to clump the minima without running the solver again? That is the point of the `clumpthem` function. It takes `MultiStart` or `GlobalSearch` solutions and clumps them to the tolerances you want. For example

```
clumped = clumpthem(manyminsm,1e-3,1e-4); % the first tolerance is TolX, then TolFun
size(clumped)
```

ans = 1 6

The answers you get either way are identical.

isequal(manyminsloose(1).X, clumped(1).X)

ans = 1

How does `clumpthem` work? It uses the fact that `MultiStart` and `GlobalSearch` return solutions in order, from best (lowest objective function value) to worst. It takes the first solution and compares it to the second, both in terms of objective function values and distance between the solutions. If both comparisons are below the tolerances, `clumpthem` removes the second solution, adding its starting point to the list of `X0` starting points. If the objective function value of the second solution is too high, then `clumpthem` starts a new clump. If the objective function difference is small enough, but the distance between solutions is too large, `clumpthem` proceeds to compare the first solution with the third. It continues in this way until it runs out of solutions to clump.

This procedure is quite speedy, as you will see in the next section.

Check the speed of `clumpthem`.

tic; clumped = clumpthem(manyminsm,1e-3,1e-4); toc

Elapsed time is 0.006322 seconds.

`MultiStart` is much slower, even with this fairly simple function:

rng default % for reproducibility tic; [xminloose,fminloose,flagloose,outptloose,manyminsloose] = run(ms,problem,50); toc

MultiStart completed the runs from all start points. All 50 local solver runs converged with a positive local solver exit flag. Elapsed time is 0.863509 seconds.

Do you use `MultiStart` or `GlobalSearch` to search for multiple local solutions? Have you been annoyed by finding spurious differences in solutions? Tell us about it here.

Get
the MATLAB code

Published with MATLAB® R2014a

Today I have for you an insider's view of a subtle aspect of testing image processing software (such as the Image Processing Toolbox!).

I've written several times in this blog about testing software. Years ago I wrote about the testing framework I put on the File Exchange, and more recently (12-Mar-2013) I've promoted the new testing framework added to MATLAB a few releases ago.

In an article I wrote for the IEEE/AIP magazine *Computing in Science and Engineering* a few years ago, I described the difficulties of comparing floating-point values when writing tests. Because floating-point arithmetic operations are subject to round-off error and other numerical issues, you generally have to use a tolerance when checking an output value for correctness. Sometimes it might be appropriate to use an *absolute tolerance*:

$$|a - b| \leq T$$

And sometimes it might be more appropriate to use a *relative tolerance*:

$$ \frac{|a - b|}{\max(|a|,|b|)} \leq T $$

But for testing image processing code, this isn't the whole story, as I was reminded a year ago by a question from Greg Wilson. Greg, the force behind Software Carpentry, was asked by a scientist in a class about how to test image processing code, such as a "simple" edge detector. Greg and I had an email conversation about this, which Greg then summarized on the Software Carpentry blog.

This was my initial response:

*Whenever there is a floating-point computation that is then quantized to produce an output image, comparing actual versus expected can be tricky. I had to learn to deal with this early in my MathWorks software developer days. Two common scenarios in which this occurs:*

*Rounding a floating-point computation to produce an integer-valued output image**Thresholding a floating-point computation to produce a binary image (such as many edge detection methods)*

*The problem is that floating-point round-off differences can turn a floating-point value that should be a 0.5 or exactly equal to the threshold into a value that's a tiny bit below. For testing, this means that the actual and expected images are exactly the same...except for a small number of pixels that are off by one. In a situation like this, the actual image can change because you changed the compiler's optimization flags, used a different compiler, used a different processor, used a multithreaded algorithm with dynamic allocation of work to the different threads, etc. So to compare actual against expected, I wrote a test assertion function that passes if the actual is the same as the expected except for a small percentage of pixels that are allowed to be different by 1.*

Greg immediately asked the obvious follow-up question: What is a "small percentage"?

*There isn't a general rule. With filtering, for example, some choices of filter coefficients could lead to a lot of "int + 0.5" values; other coefficients might result in few or none. I start with either an exact equality test or a floating-point tolerance test, depending on the computation. If there are some off-by-one values, I spot-check them to verify whether they are caused by a floating-point round-off plus quantization issue. If it all looks good, then I set the tolerance based on what's happening in that particular test case and move on. If you tied me down and forced me to pick a typical number, I'd say 1%.*

PS. Greg gets some credit for indirectly influencing the testing tools in MATLAB. He's the one who prodded me to turn my toy testing project into something slightly more than a toy and make it available to MATLAB users. The existence and popularity of that File Exchange contribution then had a positive influence on the decision of the MathWorks Test Tools Team to create full-blown testing framework for MATLAB. Thanks, Greg!

Get
the MATLAB code

Published with MATLAB® R2014a

I look up at the sky just after sunset and I see an especially bright star. It's probably a planet. But which one?

This question gives me a good opportunity to play around with MATLAB. Let's do a visualization that shows where the planets are relative to the earth and the sun. In the process, we'll use JSON services, the File Exchange, MATLAB graphics, and 3-D vector mathematics cribbed from Wikipedia.

Here is the basic grade-school illustration of the solar system, the one that shows the planets rolling around the sun like peas on a plate. For simplicity, we're just showing the sun, the earth, the moon, Venus, and Mars.

But we never see anything like this with our own eyes. Instead, we see bright spots on a dark background somewhere "up there." So let's simplify our problem to determining what direction each naked-eye planet is in. This leads to an image like this.

Our goal is to make an accurate up-to-date version of this diagram. Specifically, relative to the sun, where should we look to find the moon and the naked-eye planets (Mercury, Venus, Mars, Jupiter, and Saturn)? To do this, we need to solve a few problems.

- Find the planets
- Find the unit vector pointing from earth to each planet
- Squash all these vectors onto a single plane
- Visualize the resulting disk

First of all, where are the planets? There's a free JSON service for just about everything these days. I found planetary data hosted on Davy Wybiral's site here:

http://davywybiral.blogspot.com/2011/11/planetary-states-api.html

```
url = 'http://www.astro-phys.com/api/de406/states?bodies=sun,moon,mercury,venus,earth,mars,jupiter,saturn';
json = urlread(url);
```

Now we can use Joe Hicklin's JSON parser from the File Exchange. It returns a well-behaved MATLAB structure.

data = JSON.parse(json)

data = date: 2.4568e+06 results: [1x1 struct] unit: 'km'

The payload is in the "results" field. Each entry has three position components and three velocity components.

data.results

ans = mercury: {{1x3 cell} {1x3 cell}} sun: {{1x3 cell} {1x3 cell}} moon: {{1x3 cell} {1x3 cell}} jupiter: {{1x3 cell} {1x3 cell}} mars: {{1x3 cell} {1x3 cell}} earth: {{1x3 cell} {1x3 cell}} venus: {{1x3 cell} {1x3 cell}} saturn: {{1x3 cell} {1x3 cell}}

The distances are in kilometers, and I'm not even sure how this representation is oriented relative to the surrounding galaxy. But it doesn't really matter, because all I care about is the relative positions of the bodies in question.

Side note: We could also have used the Aerospace Toolbox to get the same information.

`[pos,vel] = planetEphemeris(juliandate(now),'Sun','Earth')`

% List of bodies we care about ssList = {'sun','moon','mercury','venus','earth','mars','jupiter','saturn'}; ss = []; for i = 1:length(ssList) ssObjectName = ssList{i}; ss(i).name = ssObjectName; ssData = data.results.(ssObjectName); ss(i).position = [ssData{1}{:}]; ss(i).velocity = [ssData{2}{:}]; end

% Plot in astronomical units au = 149597871; k = 5; cla for i = 1:length(ss) p = ss(i).position/au; line(p(1),p(2),p(3), ... 'Marker','.','MarkerSize',24) text(p(1),p(2),p(3),[' ' ss(i).name]); end view(3) grid on axis equal

This is accurate, but not yet very helpful. Let's now calculate the geocentric position vectors of each planet. To do this, we'll put the earth at the center of the system. Copernicus won't mind, because A) he's dead, and B) we admit this reference frame orbits the sun.

We're also going to use another file from the File Exchange. Georg Stillfried's mArrow3 will help us make nice 3-D arrows in space.

clf pEarth = ss(5).position; for i = 1:length(ss) % pe = position relative to earth % (i.e. a vector pointing from earth to body X) pe = ss(i).position - pEarth; % pne = normalized position relative to earth pne = pe/norm(pe); ss(i).pne = pne; mArrow3([0 0 0],pne, ... 'stemWidth',0.01,'FaceColor',[1 0 0]); text(pne(1),pne(2),pne(3),ss(i).name, ... 'HorizontalAlignment','center'); hold on end light hold off axis equal axis off axis([-1.2 1.2 -0.8 1.1 -0.2 0.8])

These are unit vectors pointing out from the center of the earth towards each of the other objects. It's a little hard to see, but these vectors are all lying in approximately the same plane.

If we change our view point to look at the system edge-on, we can see the objects are not quite co-planar. For simplicity, let's squash them all into the same plane. For this, we'll use the plane defined by the earth's velocity vector crossed with its position relative to the sun. This defines "north" for the solar system.

pEarth = ss(5).position; pSun = ss(1).position; vEarth = ss(5).velocity; earthPlaneNormal = cross(vEarth,pSun - pEarth); % Normalize this vector earthPlaneNormalUnit = earthPlaneNormal/norm(earthPlaneNormal); mArrow3([0 0 0],earthPlaneNormalUnit, ... 'stemWidth',0.01,'FaceColor',[0 0 0]); view(-45,15) axis([-1.2 1.2 -0.8 1.1 -0.2 0.8])

Now we project the vectors onto the plane defined by earth's motion around the sun. I learned what I needed from Wikipedia here: Vector Projection.

Since we are working with the normal, we are technically doing a vector rejection. Using Wikipedia's notation, this is given by

$$ \mathbf{a_2} = \mathbf{a} - \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{b} \cdot \mathbf{b}} \mathbf{b} $$

hold on for i = 1:length(ss) pne = ss(i).pne; pneProj = pne - dot(pne,earthPlaneNormalUnit)/dot(earthPlaneNormalUnit,earthPlaneNormalUnit)*earthPlaneNormalUnit; ss(i).pneProj = pneProj; mArrow3([0 0 0],pneProj, ... 'stemWidth',0.01,'FaceColor',[0 0 1]); end hold off axis equal

We're close to the end now. Let's just calculate the angle between the sun and each element. Then we'll place the sun at the 12:00 position of our planar visualization and all the other planets will fall into place around it.

Calculate the angle between the sun and each of the bodies. Again, from the Wikipedia article, we have

$$ cos \theta = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}||\mathbf{b}|} $$

sun = ss(1).pneProj; ss(1).theta = 0; for i = 1:length(ss) pneProj = ss(i).pneProj; cosTheta = dot(sun,pneProj)/(norm(sun)*norm(pneProj)); theta = acos(cosTheta); % The earth-plane normal vector sticks out of the plane. We can use it % to determine the orientation of theta x = cross(sun,pneProj); theta = theta*sign(dot(earthPlaneNormalUnit,x)); ss(i).theta = theta; end

cla k1 = 1.5; k2 = 1.2; for i = 1:length(ss) beta = ss(i).theta + pi/2; x = cos(beta); y = sin(beta); mArrow3([0 0 0],[x y 0], ... 'stemWidth',0.01, ... 'FaceColor',[0 0 1]); line([0 k1*x],[0 k1*y],'Color',0.8*[1 1 1]) text(k1*x,k1*y,ss(i).name,'HorizontalAlignment','center') end t = linspace(0,2*pi,100); line(k2*cos(t),k2*sin(t),'Color',0.8*[1 1 1]) line(0,0,1,'Marker','.','MarkerSize',40,'Color',[0 0 1]) axis equal axis(2*[-1 1 -1 1])

And there you have it: an accurate map of where the planets are in the sky for today's date. In this orientation, planets "following" the sun through the sky are on the left side of the circle. So Jupiter will be high in the sky as the sun sets.

And that is a very satisfying answer to my question, by way of vector math, JSON feeds, MATLAB graphics, and the File Exchange.

Get
the MATLAB code

Published with MATLAB® R2014a

MATLAB and Simulink have a powerful suite of routines for the numerical solution of ordinary differential equations. Today's post offers an introduction. Subsequent posts will examine several of the routines in more detail.

MATLAB started its life as a "Matrix Laboratory." The capability for the numerical solution of ordinary differential equations was soon added. This, together with the block diagram programming environment, provides the multidomain simulation companion product Simulink. The linear equation solutions from the matrix library underlie the stiff solvers in the ode suite.

Larry Shampine is a highly regarded mathematician and computational scientist. He is a Professor, now Emeritus, at Southern Methodist University in Dallas. Before SMU, Larry was with Sandia National Laboratory in Albuquerque. He has been a consultant to the MathWorks for many years. He has written several books and many research and expository papers. His software, originally in Fortran and more recently in MATLAB, has been the basis for our ODE codes, as well as others in the industry. Two of Larry's PhD students, Jacek Kierzenka and Michael Hosea, are on the staff of the MathWorks.

There are seven routines in the MATLAB suite. Their names all begin with " `ode` ". This is followed by two or three digits, and then perhaps one or two letters. The digits indicate the *order*. Roughly, higher order routines work harder and deliver higher accuracy per step. A suffix "s", "t", or "tb" designates a method for *stiff* equations. Here is the list of the functions in the suite.

`ode45`--- The first choice for most nonstiff problems.`ode23`--- Less stringent accuracy requirements than`ode45`.`ode113`--- More stringent accuracy requirements than`ode45`.`ode15s`--- The first choice for most stiff problems.`ode23s`--- Less stringent accuracy requirements than`ode15s`.`ode23t`--- Moderately stiff problems without numerical damping.`ode23tb`--- Less stringent accuracy requirements than`ode15s`.

Here is a simple code, `ode2`, which I will use to discuss *order* and the digits in the suite naming convention. It uses a fixed step size and evaluates the function defining the differential equation twice per step. The first function evaluation at the start of the step provides the slope for a half-step to the midpoint of the interval. The second function evaluation at the midpoint provides the slope for the step all the way across the interval.

```
type ode2
```

function ode2(F,t0,h,tfinal,y0) % ODE2 A simple ODE solver. % ODE2(F,t0,h,tfinal,y0) uses a midpoint rule % with fixed step size h on the interval % t0 <= t <= tfinal % to solve % dy/dt = F(t,y) % with y(t0) = y0. y = y0; for t = t0 : h : tfinal-h s1 = F(t,y); ymid = y + h*s1/2; s2 = F(t+h/2, ymid); y = y + h*s2; disp(y) end

At first glance you might think that the "2" in the name `ode2` just comes from the fact that the function `F(t,y)` is evaluated twice each step. But the reason for the 2 is deeper than that. It is actually because this is a *second order* solver. Let's see what *second order* means. Try solving one of the world's easiest differential equations,

$$ dy/dt = 1 $$

With initial condition $y(0) = 0$, the solution is

$$ y(t) = t $$

I'm going to solve this on the interval $0 \le t \le 10$ with a step size $h = 1$, so the result displays as integers.

```
format short
F = @(t,y) 1;
ode2(F,0,1,10,0)
```

1 2 3 4 5 6 7 8 9 10

Well, we got that right. Now try

$$ dy/dt = 2t $$

With initial condition $y(0) = 0$, the solution is

$$ y(t) = t^2 $$

F = @(t,y) 2*t; ode2(F,0,1,10,0)

1 4 9 16 25 36 49 64 81 100

Again, we got that right. So try

$$ dy/dt = 3t^2 $$

Do we generate

$$ y(t) = t^3 $$

F = @(t,y) 3*t.^2; ode2(F,0,1,10,0)

0.7500 7.5000 26.2500 63 123.7500 214.5000 341.2500 510 726.7500 997.5000

No, we do not get $t^3$ exactly.

So, our simple function `ode2` computes the exact answer for an ODE whose solution is a polynomial in $t$ of degree 2, but not 3. This is what we mean by a solver of *second* order.

Here's a homework exercise. Write the other simple `ode2`, the one based on the trapezoid rule. Evaluate `F(t,y)` at the start of the interval. Then take a step all the way across. Evaluate `F(t,y)` a second time at the other end of the interval. Then use the average of the two slopes to take the actual step. You should find that the trapezoid version of `ode2` has the save behavior as the midpoint version. It solves polynomials of degree 1 and 2 exactly, but not polynomials of degree 3.

In both of these cases, it turns out that "2" is both the number of function evaluations per step and the order. This is not always the case. In later posts, we will see that as we strive for higher accuracy and more efficient methods, the number of function evaluations required increases faster than the order.

The classical Runge-Kutta method was developed independently by a famous German applied mathematician, Carl Runge, in 1895, and another German applied mathematician, who was not quite as famous, Wilhelm Kutta, in 1901. It was used extensively for hand computations and is still probably in widespread use on digital computers today. You can see from this MATLAB version, a function called `ode4`, that it evaluates `F(t,y)` four times per step.

```
type ode4
```

function ode4(F,t0,h,tfinal,y0) % ODE4 Classical Runge-Kutta ODE solver. % ODE4(F,t0,h,tfinal,y0) uses the classsical Runge-Kutta % method with fixed step size h on the interval % t0 <= t <= tfinal % to solve % dy/dt = F(t,y) % with y(t0) = y0. y = y0; for t = t0 : h : tfinal-h s1 = F(t,y); s2 = F(t+h/2, y+h*s1/2); s3 = F(t+h/2, y+h*s2/2); s4 = F(t+h, y+h*s3); y = y + h*(s1 + 2*s2 + 2*s3 + s4)/6; disp(y) end

You will find that this function integrates $1$, $t$, $t^2$ and $t^3$ exactly. But how about $t^4$?

format long e F = @(t,y) 5*t^4; ode4(F,0,1,10,0)

1.041666666666667e+00 3.208333333333334e+01 2.431250000000000e+02 1.024166666666667e+03 3.125208333333333e+03 7.776250000000000e+03 1.680729166666666e+04 3.276833333333333e+04 5.904937500000000e+04 1.000004166666667e+05

The exact solution would be the integers, $t^5$, but we didn't quite get them. So, classical Runge-Kutta computes the exact answer for an ODE whose solution is a polynomial in $t$ of degree 4, but not 5. This is what we mean by a solver of *fourth* order, and is the reason why this function is called `ode4`.

The name of `ode4` has only one digit, not two, and classical Runge-Kutta does not qualify for a spot in the MATLAB suite because the method has a fixed step size. There is no error estimate. Modern numerical methods, and modern mathematical software, include error estimates and automatic step size control. This is the subject of my next blogs.

I want to include some graphics in today's blog, so here let's use `ode23` to plot the three components of the Lorenz chaotic differential equation described in my previous blog post.

```
type lorenzplot
lorenzplot
```

function lorenzplot % LORENZPLOT Plot the three components of the solution % to the Lorenz equations. sigma = 10; beta = 8/3; rho = 28; eta = sqrt(beta*(rho-1)); A = [ -beta 0 eta 0 -sigma sigma -eta rho -1 ]; v0 = [rho-1 eta eta]'; y0 = v0 + [3 2 -4]'; tspan = [0 30]; [t,y] = ode23(@lorenzeqn, tspan, y0); plot(t,[y(:,1)+15 y(:,2) y(:,3)-40]); axis([0 30 -80 80]) set(gca,'ytick',[-40 0 40],'yticklabel','y3|y2|y1') xlabel('t') title('Lorenz Attractor') % ------------------------------------ function ydot = lorenzeqn(t,y) A(1,3) = y(2); A(3,1) = -y(2); ydot = A*y; end end

Get
the MATLAB code

Published with MATLAB® R2014a

Here at the MathWorks we have a great staff to support the financial industry. We also have our fair share of DIYers who like to follow the market and manage their own investments, such as this edition's guest blogger, Steve Cruickshank, an admitted "cocktail party economist".

During his day job, Steve is the marketing manager for our MATLAB Application Deployment products.

If you’re an amateur investor like me, you probably experienced great frustration during late 2008 and early 2009, watching the value of your various stock portfolios, IRAs, and 401K accounts plummet. Sure, we lived the high life during the run-ups of the late 90s dot-com boom and subsequent gains during 2003 – 2008, but the Great Recession of late 2008 and early 2009 made me take a closer look at the S&P 500, especially the long term trends.

Caveat emptor - I’m not a financial industry professional and couldn’t tell you the difference between GARCH and Black-Scholes without using Wikipedia, but I do enjoy managing my own investments and crunching numbers to gain perspective. As long as we’re in “full disclosure” mode, I borrowed the phrase “cocktail party economist” from my grad school Economics professor.

When looking at the S&P 500 trends over various timeframes, or reading web pages from the various pundits, it’s easy to see that you can cherry pick a time period to validate virtually any hypothesis. Think the market is below a reasonable level? Then start your analysis fifteen years ago … 1999 … the height of the dot com boom. Since then the S&P has grown a paltry 1.4% annually, far below the historical average annual growth rate of ~7.5%. Plenty of room for growth, so buy everything in sight.

Want to espouse the theory that the market has grown way too fast and is in yet another bubble just waiting to burst? Sure, we can do that! Start your analysis five years ago … early 2009 … the bottom of the financial crisis. Since then the S&P has grown at a ridiculous rate, over 20% annually, roughly the same as during the dot com boom years. Time to sell everything.

Given those obvious extremes, perhaps the best way to eliminate the false conclusions reached by cherry picking time periods is to analyze data from as far back as possible and evaluate the current market based on the long term trends. The example uses Yahoo Finance to provide data for the S&P 500 since 1950, but you could use other data sources and time frames as you see fit.

The MathWorks' software used in this example is MATLAB and the Datafeed Toolbox. Home users may want to use MATLAB Home and hard-core number crunchers may prefer the advanced functionality provided by the Curve Fitting Toolbox.

I am using `fetch` from the Datafeed Toolbox to pull data from Yahoo based on a user specified timeframe, i.e. how many years from now. This example uses the 'w' option to get *weekly* data, but could also use 'm' for *month* or 'd' for *day* without materially affecting the results and conclusions.

% Set Yahoo as data source. y = yahoo; % Determine how far back to look. Yahoo provides data starting at 1950, % so 65 years gathers all the available data. years = 65; start = now - (365.25 * years); % Get data from Yahoo. CloseData = fetch(y, '^GSPC', 'Close', start, now, 'w');

We can then extract the closing price and date for each point, using the `polyfit` and `polyval` functions to determine the best fit line through the data.

% Extract data of interest from Yahoo results ClosePrice = CloseData(:,2); CloseDate = CloseData(:,1); % Curve fit using a polynomial fit to create curve fit item and trend data % based on curve fit result. TrendLine = polyfit(CloseDate,log(ClosePrice), 1); TrendData = polyval(TrendLine,CloseDate);

Option: If your analysis requires other types of curve fitting, you could also use Curve Fitting Toolbox, as shown here:

TrendLine2 = fit(CloseDate, log(ClosePrice), 'poly1'); TrendData = feval(TrendLine2, CloseDate);

Plotting the variables `CloseDate`, `ClosePrice`, and `TrendData` on a log plot provides the following image. The variables plotted along the y-axis (`ClosePrice` and `TrendData`) are expressed as exponential functions, such as $ClosePrice = e^y$. Plotting data using exponents and/or log plots provides a better visual representation of average growth rates over long periods of time.

% Plotting the actual performance against best fit line plot(CloseDate,log(ClosePrice),CloseDate,TrendData); title(['S&P 500 Performance and Best Fit Curve, last ',num2str(years),' years']) ylabel('Closing and Best Fit Prices (e^y)') xlabel('Closing Dates') datetick('x');

This plot provides an effective visual mechanism to tell if the market is currently above or below the long term average. Interestingly, the current market (as of late-April 2014) is landing very close to the best fit curve, so perhaps the market is just about where it “should” be. You probably won’t hear that sentiment from market prognosticators because bulls, bears, and bubbles generate more page views.

Using the `axis` function to zoom in on the previous plot, we can highlight two easily recognizable data points: the peak of the late ‘90’s dot com bubble and the financial crisis roughly ten years later. In this example, the `annotation` function provides the graphics.

% Setting appropriate zoom coordinates axis([729661, 736053, 6.4, 7.6]) % Create ellipses to highlight areas of interest annotation('ellipse',[0.168, 0.652, 0.149, 0.152], 'LineWidth',2, 'Color','r'); annotation('ellipse',[0.592, 0.183, 0.078, 0.431], 'LineWidth',2, 'Color','r');

Visuals are nice, but what if you, like so many of us here at MathWorks, prefer numbers and want to quantify the information rather than eyeball it? The first step in quantifying this data is to determine the variance, the variable `Delta` in my example, between the actual closing price and the projected price based on the best fit line.

```
% Calculate the difference between actual results and best fit curve
Delta = log(ClosePrice) - TrendData;
```

The calculations above will transpose the sloped line of the best fit curve to a horizontal line, but the familiar data points from the previous figure are clear, the dot com boom and financial crisis. We can then plot `Delta` relative to the best fit curve using the functions below. The `zeros` function converts a single '0' to a vector for effective plotting.

% Clear the figure then plot the raw data. Note that the BestFitCurve is % multiplied by zeros to get reasonable legend clf h1 = plot(CloseDate, Delta); title(['S&P 500 Variance from Best Fit Curve, last ',num2str(years),' years']) datetick ('x'); BestFitCurve = line(CloseDate, zeros(size(CloseDate)), 'Color', 'k'); ylabel('Variance from Best Fit Line') xlabel('Closing Dates') % Create ellipses to highlight areas of interest annotation('ellipse', [0.753, 0.155, 0.077, 0.364], 'LineWidth',2, 'Color','r'); annotation('ellipse', [0.631, 0.793, 0.104, 0.131], 'LineWidth',2, 'Color','r');

To continue quantifying the results, we can use the `std` function to determine the standard deviation, variable name `Sigma`, for the variable `Delta`. To cover a broader range of variances we can then apply multipliers (2, 1, ½, -½, -1, -2) to `Sigma`, showing statistically how far above or below the market is with respect to the long term average. Thankfully none of the market fluctuations reached the Six Sigma levels.

% Define Sigma as the standard deviation of Delta, then convert to a vector % using a matrix of ones Sigma = std(Delta)*ones(size(CloseDate)); % Clear the previous figure, define the various Sigma's, then plot the data % with a legend clf h1 = plot (CloseDate, Delta); title(['S&P 500 Variance from Best Fit Curve, last ',num2str(years),' years']) datetick('x'); BestFitCurve = line(CloseDate, zeros(size(CloseDate)), 'Color', 'k'); TwoSigma = line(CloseDate, 2*Sigma, 'Color', 'g', 'LineStyle', ':'); OneSigma = line(CloseDate, Sigma, 'Color', 'g', 'LineStyle', '--'); HalfSigma = line(CloseDate, 0.5*Sigma, 'Color', 'g'); NegHalfSigma = line(CloseDate, -0.5*Sigma, 'Color', 'r'); NegOneSigma = line(CloseDate, -Sigma, 'Color', 'r', 'LineStyle', '--'); NegTwoSigma = line(CloseDate, -2*Sigma, 'Color', 'r', 'LineStyle', ':'); ylabel('Variance from Best Fit Line') xlabel('Closing Dates') legend([h1 TwoSigma OneSigma HalfSigma BestFitCurve ... NegHalfSigma NegOneSigma NegTwoSigma], ... {'Delta', '+ 2\sigma', '+ \sigma', '+ \sigma/2' 'Best Fit', ... '- \sigma/2', '- \sigma','- 2\sigma'}, 'Location', 'NorthEastOutside')

With data and analysis like above, hindsight is always 20/20. In a perfect world, most investors like myself probably wish they had sold everything when the S&P went above the +2Sigma level near the end of the dot com bubble, and bought as much as possible on margin during extremely short window below -2Sigma in early 2009.

We all have our own risk profiles, so I won’t pretend to tell you what level represents an attractive buying or selling opportunity ... you need to do your own homework and decide for yourself. The example in this article is just one method to cut through the propaganda and utilize math to provide relatively unbiased data.

One final question for the readers ... do you agree with the statement made earlier that the current level of S&P 500 is about where it "should" be given the long term data? If not, what's your expected level and rationale? Please let me know here.

Get
the MATLAB code

Published with MATLAB® R2014a

That’s awesome. I was always hoping for that. I just linked my Github Repository to File Exchange :)

This is a nice feature, and thanks for the information. Github is friendly for development and bug fixing and MATLAB Central is professional for publishing MATLAB packages.

This is great news! Updating and maintaining the code was an issue with File Exchange. Thank you for making this connection!

That’s great news and a very logical step towards more open source collaboration in MATLAB. Good job!

Since we launched the feature, we’ve received an average of two GitHub links per day, for a total of 28 repositories so far. Thanks to all of you who have added a link to your GitHub project.

If you want to keep track of the progress, File Exchange guru Josh Natanson has created a trend for it over on Trendy. Watch the number grow! And if you want to see the actual list of GitHUb projects, follow this link:

http://www.mathworks.com/matlabcentral/fileexchange/?term=is_github:true

]]>Changing the value of a parameter in the equations that produce the famous Lorenz chaotic attractor yields nonlinear ordinary differential equations that have periodic solutions.

(This section is adapted from chapter 7 of my book Numerical Computing with MATLAB, published by MathWorks and SIAM.)

The Lorenz chaotic attractor was first described in 1963 by Edward Lorenz, an M.I.T. mathematician and meteorologist who was interested in fluid flow models of the earth's atmosphere. An excellent reference is the book by Colin Sparrow cited at the end of this blog.

I have chosen to express the Lorenz equations in a somewhat unusual way involving a matrix-vector product:

$$ \dot{y} = A y $$

Here $y$ is a vector-valued function of $t$ with three components and $A$ is a 3-by-3 matrix that depends on $y$. Seven of the nine elements in $A$ are constant, but the other two depend on $y_2(t)$:

$$A = [\ -\beta \ \ 0 \ \ y_2 \ ; \ \ 0 \ \ -\sigma \ \ \sigma \ ; \ \ -y_2 \ \ \rho \ \ -1 \ ]$$

The first component of the solution, $y_1(t)$, is related to the convection in the atmospheric flow, while the other two components are related to horizontal and vertical temperature variation. The parameter $\sigma$ is the Prandtl number, $\rho$ is the normalized Rayleigh number, and $\beta$ depends on the geometry of the domain. The most popular values of the parameters are $\sigma = 10$, $\beta = 8/3$, and $\rho = 28$. These are outside the ranges associated with the earth's atmosphere.

The deceptively simple nonlinearity introduced by the presence of $y_2$ in the system matrix $A$ changes everything. There are no random aspects to these equations, so the solutions $y(t)$ are completely determined by the parameters and the initial conditions, but their behavior is very difficult to predict. For some values of the parameters, the orbit of $y(t)$ in three-dimensional space is known as a *strange attractor*. It is bounded, but not periodic and not convergent. It never intersects itself. It ranges chaotically back and forth around two different points, or attractors. For other values of the parameters, the solution might converge to a fixed point, diverge to infinity, or oscillate periodically.

Think of $\eta = y_2$ as a free parameter, restrict $\rho$ to be greater than one, and study the matrix

$$A = [\ -\beta \ \ 0 \ \ \eta \ ; \ \ 0 \ \ -\sigma \ \ \sigma \ ; \ \ \eta \ \ \rho \ \ -1 \ ]$$

It turns out that $A$ is singular if and only if

$$ \eta = \pm \sqrt{\beta (\rho-1)} $$

The corresponding null vector, normalized so that its second component is equal to $\eta$, is

$$ [\ \rho-1 \ \ \eta \ \ \eta \ ]^T $$

With the two different signs for $\eta$, this defines two points in three-dimensional space. These points are fixed points for the differential equation. If either of these points is taken as the initial condition, then the initial derivative $\dot{y}(0) = 0$ and so $y(t)$ never changes.

Fix $\sigma = 10$, $\beta = 8/3$, and investigate the effect of the parameter $\rho$. For $\rho$ less than about 24.7, the fixed points are stable. If the initial value is close enough, the orbit will spiral in to the fixed point. However, for larger values of $\rho$ these points are unstable. If $y(t)$ does not start at one of these points, it will never reach either of them. For most values of $\rho$ greater than 24.7, including the popular $\rho = 28$, the orbit is chaotic.

In his comprehensive study of the Lorenz equations, Colin Sparrow found four exceptional values of $\rho$ where a stable periodic orbit arises out of the chaos. If the initial point happens to be anywhere on the orbit, the solution will return to that point in finite time. If the initial point is not on the orbit, the trajectory will converge towards the periodic orbit. This behavior is unusual for nonlinear equations.

Each orbit has what might be called a characteristic *signature*. Each of the four orbits has a different signature. Use "+" and "-" to denote the sign of $\eta$ in the null vector. In the three-dimensional plots below, the red dots are these attractors. The point with the "+" is on the upper left, while the "-" is on the lower right.

This is the most complex orbit. Its signature is +--+--. It circles the "+" fixed point once and then the "-" fixed point twice. Then it circles the "+" fixed point once again and the "-" point twice again on a slightly different trajectory before repeating the orbit.

The signature is ++-. Circle "+" twice, then "-" once.

The signature is ++--. Circle each fixed point twice before heading towards the other.

The simplest orbit. The signature is +-. No fuss, no bother.

In case you haven't seen it before, here is a plot for the value of $\rho$ that is usually studied, $\rho = 28$. This is the well known Lorenz butterfly. To really appreciate it, you should literally get hold of a three-dimensional plot and view it from different angles. This trajectory goes on forever, never intersecting itself, and never getting too close to, or too far away from, the two unstable fixed points. Most values of $\rho$ near $\rho = 28$ produce similar chaotic behavior.

If you want to see these orbits in action, in 3-D, download and run lorenzgui.m.

Colin Sparrow, *The Lorenz Equations: Bifurcations, Chaos, and Strange Attractors*, Springer, 1982, <http://link.springer.com/book/10.1007%2F978-1-4612-5767-7>

James Gleick on Chaos: Making a New Science, YouTube interview, https://www.youtube.com/watch?v=3orIIcKD8p4

James Gleick, *Chaos: Making a New Scince*, Penguin Books, 1987, <http://www.penguin.com/book/chaos-by-james-gleick/9780143113454>,

Get
the MATLAB code

Published with MATLAB® R2014a

This is a mid-term election year here in the United States, and today's guest blogger, Toshi Takeuchi, invites you to try election poll analysis with MATLAB for fun.

- Why election polls?
- Does national politics affect local election?
- Could this have been caused by national politics?
- All politics is perhaps local
- MATLAB example with Pollster API and JSONlab
- Convert the data into a table
- Remove missing values
- Plotting Obama Job Approval
- Automate the process with object oriented programming
- Have I whetted your appetite?

Availability of abundant data, coupled with a very impressive success of a political outsider, Nate Silver, who made correct calls in the last two presidential elections, turned election poll analysis one of a fertile playgrounds for hobbyists to apply their data analytics skills for fun.

In this post, I would like to share an example of such analysis using a recent outcome of the special congressional election in Florida, along with a simple example of getting election poll data from Pollster website in JSON format and automating the data pull process with object oriented programming.

There was a race in Florida recently that was supposedly a test case for how Obama’s healthcare law impacts the mid-term election. Or was it?

What you can see in this plot is that the number of undecided voters suddenly dropped, and both Sink (D) and Jolly (R) benefited from it, but larger percentage of those voters actually ended up voting for Jolly, rather than Sink. This rapid shift happened around Feb 5 – 12. What I expected was a smoother decline of undecided voters over time, perhaps more accelerated toward the election day.

If you believe the pundits, then national issues like the healthcare law affects the local politics. Let’s use Obama’s job approval rating as a proxy to check it out.

As you can see in the plot, Obama’s national poll was actually going up towards the end of this election. This should have benefited Sink.

Actually, it is more important to see the local trend rather than the national trend. So use the polls from Florida alone to see the local Obama Job Approval trend.

Obama's Job Approval was recovering at the national level, but his approval was actually going down in Florida during this election. I am wondering what was happening around the time that the undecideds suddenly decided in the beginning of February. But can you really say this was the test of Obamacare?

National news headlines around that time:

- Philip Seymour Hoffman died
- Sochi Olympics coverage
- Farm bill passing Senate
- House approved debt limit ceiling hike

Let me know if you have good data sources to test this claim. In my opinion, Obamacare alone doesn't seem to fully explain this rapid shift.

Now I would like to address the programming aspect of this post. Pollster API provides convenient access to the data from election polls. There are other websites that aggregate election polls; this API was the easiest to use. JSON is a popular format for web services like this, and you can install an appropriate library from the FileExchange to process it. My favorite is JSONlab by Qianqian Fang.

Let’s start out with an simple example of getting data for Obama Job Approval Ratings.

baseUrl='http://elections.huffingtonpost.com/pollster/api/charts'; slug = 'obama-job-approval'; respFormat = 'json'; fullUrl = sprintf('%s/%s.%s',baseUrl,slug,respFormat); % add jsonlab to the search path addpath jsonlab; % get the poll data data=loadjson(urlread(fullUrl));

JSON stores data in a nested tree structure like XML, so we need to convert it into a table in order to use the data in MATLAB. `table` is a new feature introduced in R2013b that I like quite a lot.

% initialize variables estimates=data.estimates_by_date; date = zeros(length(estimates),1); approve = zeros(length(estimates),1); disapprove = zeros(length(estimates),1); undecided = zeros(length(estimates),1); % loop over JSON tree for i = 1:length(estimates) date(i) = datenum(estimates{i}.date); for j = 1:length(estimates{i}.estimates) if strcmpi('approve',estimates{i}.estimates{j}.choice) approve(i) = estimates{i}.estimates{j}.value; elseif strcmpi('disapprove',estimates{i}.estimates{j}.choice) disapprove(i) = estimates{i}.estimates{j}.value; else undecided(i) = estimates{i}.estimates{j}.value; end end end % consolidate the data into a table estimates = table(date,approve,disapprove,undecided); disp(estimates(1:5,:))

date approve disapprove undecided __________ _______ __________ _________ 7.3571e+05 44 51.5 0 7.3571e+05 44 51.6 4.8 7.3571e+05 43.9 51.6 0 7.357e+05 43.9 51.6 4.8 7.357e+05 43.9 51.6 4.8

Real data is never perfect, so we need to check for missing values and remove affected rows.

% get the indices of zero values isMissing = table2array(estimates) == 0; % get the count of missing values by variable disp('number of missing values by variable') disp(array2table(sum(isMissing),'VariableNames',estimates.Properties.VariableNames)); obamaDecided = estimates(~(isMissing(:,2) | isMissing(:,3)),1:3); obamaUndecided = estimates(~isMissing(:,4),[1 4]);

number of missing values by variable date approve disapprove undecided ____ _______ __________ _________ 0 1 1 205

In the final step, let's validate the data processing so far by plotting the data and compare it to the chart on Pollster website.

figure plot(obamaDecided.date,obamaDecided.approve,'k-','LineWidth',2) hold on plot(obamaDecided.date,obamaDecided.disapprove,'r-','LineWidth',2) h = plot(obamaUndecided.date,obamaUndecided.undecided,'b-','LineWidth',2); set(h, 'color', [0.7 0.7 0.7]) datetick xlabel('Date') ylabel('Estimate') legend('Approve','Dispprove','Undecided','Location','East') title(data.title) xlim([datenum('2009-01-01') Inf]) hold off

As you can see, this is an iterative process, so it is a good idea to automate some of the steps. Let’s use object oriented programming to facilitate the data pull using a custom class called `myPollster`. This way, all the processed data is encapsulated in the object itself, and you don’t run into namespace issues.

The `myPollster` class also provides a utility method to return the logical indices of missing values in the table.

% instantiate the object FL13 = myPollster(); % specify the slug for the data pull slug = '2014-florida-house-13th-district-special-election'; % call the API and store the result in the object. FL13.getChartData(slug); % check the result disp(FL13.title) disp(FL13.T(1:5,:)) % Check for missing values disp('check which variable contains missing values...') disp(array2table(sum(FL13.isMissing),'VariableNames',FL13.T.Properties.VariableNames))

2014 Florida House: 13th District Special Election Date Sink Jolly Overby Undecided __________ ____ _____ ______ _________ 7.3567e+05 46 44.3 6.4 3.3 7.3567e+05 46 44.3 6.4 3.3 7.3567e+05 46 44.3 6.4 3.4 7.3567e+05 45.9 44.3 6.4 3.4 7.3567e+05 45.9 44.3 6.4 3.4 check which variable contains missing values... Date Sink Jolly Overby Undecided ____ ____ _____ ______ _________ 0 0 0 28 0

Hopefully this simple example was sufficient to get you interested in trying it yourself. In this example, I simply took the smoothed trend lines provided by Pollster, but you could also get individual poll data and build more complex model to make some prediction yourself.

Toshi

Get
the MATLAB code

Published with MATLAB® R2014a

We’ve added a direct connection between File Exchange and GitHub. In practice, this means that if your code already lives on GitHub, you can leave it there. The File Exchange is now smart enough to distribute the code directly from GitHub to anyone who asks for it from MATLAB Central. You get all the benefits of collaborative development in GitHub while community members get access to the latest version of your projects. No need to maintain the files in two different locations. Push your changes up to GitHub, and seconds later that’s what gets served from the File Exchange.

We’re very excited about this feature, because we know there are already a lot of excellent MATLAB-related repos on GitHub. With this feature, we join two strong communities and open a new era on MATLAB Central.

]]>The Singular Value Decomposition of the digram frequency matrix of a text coded with a simple substitution cypher can reveal information about the vowels and consonants in the text.

I have mentioned Don Morrison before in Cleve's Corner, in my post on Pythagorean Addition. He was the mathematician and computer scientist who recruited me to the University of New Mexico in 1972. He was remarkably imaginative and today's post is about another one of his creative ideas. We wrote a paper about it, cited below, in the *American Math Monthly* thirty years ago.

In English, and in many other languages, vowels are usually followed by consonants and consonants are usually followed by vowels. This fact is reflected by the principal component analysis of what I will call the digram frequency matrix for a sample of text. (I am using the term *digram* to refer to any pair of letters, although in the discipline of orthography the term more precisely refers to a pair of characters that represents one phoneme.)

English text uses 26 letters, so the digram frequency matrix is a 26-by-26 matrix, $A$, with counts of pairs of letters. Blanks and all other punctuation are removed from the text and the entire sample is thought of as circular or periodic, so the first letter follows the last letter. The matrix entry $a_{i,j}$ is the number of times the $i$-th letter is followed by the $j$-th letter. The row and column sums of $A$ are the same; they count the number of times individual letters occur in the sample. The fifth row and fifth column usually have the largest sums because the fifth letter, which is "E," is usually the most frequent.

A principal component analysis of $A$ produces a first component,

$$ A \approx \sigma_1 u_1 v_1^T $$

that reflects the individual letter frequencies. The first right- and left-singular vectors, $u_1$ and $v_1$, have elements that are all of the same sign and that are roughly proportional to the corresponding frequencies.

We are primarily interested in the second principal component,

$$ A \approx \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T $$

The second term has positive entries in vowel-consonant and consonant-vowel positions and negative entries in vowel-vowel and consonant-consonant positions. The sign patterns adjust the frequency counts to reflect the vowel-consonant properties of the language.

My Numerical Computing with MATLAB toolbox `ncm` includes a function, `digraph.m`, that carries out this analysis. Let's run the function on Lincoln's Gettysburg Address, `gettysburg.txt`.

```
fid = fopen('gettysburg.txt');
txt = fread(fid);
fclose(fid);
```

Convert to integers between 1 and 26.

```
k = upper(char(txt)) - 'A' + 1;
k(k < 1 | k > 26) = [];
```

Use the counting feature of `sparse` to generate the digram frequency matrix.

j = k([2:length(k) 1]); A = full(sparse(k,j,1,26,26)); cnt = sum(A);

Compute the SVD.

[U,S,V] = svd(A);

Plot the second left and right singular vectors. The $i$-th letter of the alphabet is plotted at coordinates $(u_{i,2},v_{i,2})$. The distance of each letter from the origin is roughly proportional to its frequency, and the sign patterns cause the vowels to be plotted in one quadrant and the consonants to be plotted in the opposite quadrant.

There is more detail. The letter "N" is usually preceded by a vowel and often followed by another consonant, like "D" or "G," and so it shows up in a quadrant pretty much by itself, sort of a "left vowel, right consonant" . On the other hand, "H" is often preceded by another consonant, namely "T," and followed by a vowel, "E," so it also gets its own quadrant, a "right vowel, left consonant".

for i = 1:26 if cnt(i) > 0 text(U(i,2),V(i,2),char('A'+i-1)) end end s = 4/3*max(max(abs(U(:,2))),max(abs(V(:,2)))); axis(s*[-1 1 -1 1]) axis square line([0 0],[-s s],'color','b') line([-s s],[0 0],'color','b') box title(sprintf('%d characters',length(k)))

A simple substitution cipher is obtained by permuting the letters of the alphabet. The digram matrix $A$ is replaced by $PAP^T$ where $P$ is a permutation matrix. The singular vectors are simply multiplied by $P$. Letters in the plot are permuted, but the locations and quadrants remain the same. So it is possible to identify the vowels and consonants, and perhaps -- if the sample is large enough -- the letters standing for "N" and "H" in the cryptogram.

I admit that simple substitution ciphers are not the world's most difficult codes to crack, but I found this to be an unexpected use of the SVD.

Cleve Moler and Donald Morrison, "Singular Value Analysis of Cryptograms", *The American Mathematical Monthly*, Vol. 90, No. 2, 1983, pp. 78-87, <http://www.jstor.org/stable/2975804>

Get
the MATLAB code

Published with MATLAB® R2014a

With spring comes the tax filing deadline. This post is also about taxes. I'd like to introduce this week's guest blogger, Toshi Takeuchi. Toshi analyzes web data and runs online ad campaigns here at MathWorks.

Hi, I am Toshi. I am a big fan of Nate Silver who made analyzing data very cool and mainstream. Because I analyze data a lot, it bugs me when I see questionable analyses passed around in the news media.

So when I saw this CNBC post on Google+, my “bogus data analysis” radar started sending high alerts.

This map shows you the ranking of states based on average tax amount, adjusted to the cost of living index. Let me pretend some data journalism here.

Well, I happen to think that the tax amount is correlated more directly to income, rather than cost of living. The average tax amount should be higher if you live in a state with high median income. Cost of living may be also higher in those states, but that's a secondary effect.

In order to understand the true picture, you actually need to think in terms of tax to income ratio instead. This is what you get when you use this metric.

You can see that the color shifted in a number of states if you compare the first map and the second map. Massachusetts, where I live, actually looks pretty good; so are states in Mid-Atlantic region, while they were red in the first map. On the other hand, the color flips in the other direction in some Gulf Coast states in the South.

If you believed in the original analysis and moved from Massachusetts to one of those states, then your taxes may go down, but your income may also go down even more. Not a good move, IMHO.

% Disclaimer: don't trust my analysis, either - I only did it for % debunking the original story; it is not meant as a robust analysis. % Please don't plan your relocation based on this analysis, just in case.

If you are interested in playing a data journalist, this type of analysis is fairly easy with MATLAB.

All I had to do was to download the median income dataset from the Census Bureau website and merge two datasets with the newly introduced table (available in MATLAB since R2013b). I also used Mapping Toolbox to visualize the data.

First, I went to the data sources to get the data. You can use Excel to import HTML tables into a spreadsheet directly. Census data is also available in Excel format. To match the time period with the original analysis, I used

Historical (1984 to 2012): Median Household Income by State - Single-Year Estimates [XLS - 98k]

Data sources

% load data from files into tables Tax = readtable('bestWorstStateTax.csv'); Income = readtable('medianIncome.csv'); % inspect the content disp Tax disp(Tax(1:5,:)) disp Income disp(Income(1:5,:))

Tax Rank State AvgAnnualStateLocalTaxes ____ ______________ ________________________ 1 'Wyoming' 2365 2 'Alaska' 2791 3 'Nevada' 3370 4 'Florida' 3648 5 'South Dakota' 3766 PercentDiffFromNationalAvg AdjRankCostOfLivingIdx __________________________ ______________________ -0.66 1 -0.66 4 -0.52 2 -0.48 3 -0.46 5 Income State MedianIncome StandardError ____________ ____________ _____________ 'Alabama' 43464 2529.4 'Alaska' 63648 2839.1 'Arizona' 47044 2921.7 'Arkansas' 39018 2811.5 'California' 57020 1237.5

Now we have two tables in the workspace, and you can also see that each column has a header and can contain a different data type. Both tables contains the same column called "State" containing the text string of state names. We can use that as the key to join those two tables. We don't need all the columns for this analysis, so I will join just the columns I need.

% |table| is smart - it automatically uses that "State" column as the key. % Just using |State| and |AvgAnnualStteLocalTaxes| and |State| and % |MedianIncome|. T1 = join(Tax(:,2:3),Income(:,1:2)); % rename columns T1.Properties.VariableNames = {'State','Tax','Income'}; % compute tax to income ratio T1.Ratio = T1.Tax./T1.Income; % create a new table ranked by tax to income ratio T2 = sortrows(T1,{'Ratio'}); % inspect the new table disp T2 disp(T2(1:5,:))

T2 State Tax Income Ratio ______________ ____ ______ ________ 'Wyoming' 2365 57512 0.041122 'Alaska' 2791 63648 0.043851 'Washington' 3823 62187 0.061476 'Nevada' 3370 47333 0.071197 'South Dakota' 3766 49415 0.076212

Check whether the new metric produced any meaningful differences.

disp('Top 20') disp('By Avg. Tax By Avg. Ratio') disp([T1.State(1:20) T2.State(1:20)])

Top 20 By Avg. Tax By Avg. Ratio 'Wyoming' 'Wyoming' 'Alaska' 'Alaska' 'Nevada' 'Washington' 'Florida' 'Nevada' 'South Dakota' 'South Dakota' 'Washington' 'Florida' 'Texas' 'Colorado' 'Delaware' 'Texas' 'North Dakota' 'North Dakota' 'Colorado' 'Utah' 'New Mexico' 'Delaware' 'Alabama' 'Massachusetts' 'Arizona' 'New Hampshire' 'Utah' 'Virginia' 'Mississippi' 'Maryland' 'Indiana' 'District of Columbia' 'Louisiana' 'Rhode Island' 'West Virginia' 'Arizona' 'Montana' 'Hawaii' 'Oklahoma' 'New Jersey'

Now we will start using the functions from Mapping Toolbox. First we will assemble the required pieces of data to prepare a map.

Note: I also used Bioinformatic Toolbox function redgreencmap to create the colormap to go from green to red to mirror the scheme in the original map. If you don't have this toolbox, you can easily create a custom map in MATLAB. Colormaps are arrays of RGB values (triplets) in the range from 0 to 1.

% get the US geography data as a structure array states = shaperead('usastatelo', 'UseGeoCoords', true); % Get the state names as a cell array of strings. names = {states.Name}; % This is a vector that the stores ranking of each state. ranking = zeros(length(names),1); for i=1:length(names) ranking(i)=find(strcmpi(names(i),T2.State)); end % Create a colormap that goes from green to red in 51 steps. colors = redgreencmap(length(ranking)); % Sort colors by state ranking. stateColors = colors(ranking,:); % Separate Hawaii and Alaska from the Continental US. indexHawaii = strcmp('Hawaii',names); indexAlaska = strcmp('Alaska',names); indexConus = 1:numel(states); indexConus(indexHawaii|indexAlaska) = [];

Now we are ready to draw the map.

% This creates a figure with axes of US geography. % It contains three axes - Continental US, Alaska and Hawaii. figure; ax = usamap('all'); % We don't need the axes, so turn them off. set(ax, 'Visible', 'off') % Draw the states with specified color within the Continental US. for j = 1:length(indexConus) geoshow(ax(1), states(indexConus(j)),'FaceColor',stateColors(indexConus(j),:)) end % Now do the same for Alaska and Hawaii. geoshow(ax(2), states(indexAlaska),'FaceColor',stateColors(indexAlaska,:)) geoshow(ax(3), states(indexHawaii),'FaceColor',stateColors(indexHawaii,:)) % We don't need geographical details, so turn them off for each axes. for k = 1:3 setm(ax(k), 'Frame', 'off', 'Grid', 'off',... 'ParallelLabel', 'off', 'MeridianLabel', 'off') end % Add a colorbar. colormap(flipud(colors)) c= colorbar('YTickLabel',... {'51','41',... '31','21','11','1'}); ylabel(c,'Ranking')

You can download the data files from these links:

Did you enjoy seeing how to use MATLAB for debunking some bad news analysis? Would you like to try? Perhaps you already do this yourself. Tell us know about it here.

Get
the MATLAB code

Published with MATLAB® R2014a

Since this is the first major overhaul of the Blogs site, we’re all pleased to see the new changes released. In fact, we’d like to give you a tour of the Blogs site to show off some of the new features that we’ve added. So sit back and enjoy the view as we take a trip through the new Blogs site.

If you’re a regular reader of the blogs, chances are you haven’t been to the landing page in a while. The last time you went, it probably looked like this:

Now, the landing page looks like this:

In addition to sporting big, colorful images of our bloggers, there are a few other things we’d like to point out:

**Highlights.**The most recent blog posts and Technical Articles are highlighted.**Synopsis.**Mouse over any of the images and you’ll get a brief write-up about the blogger and the focus of their blog. Click and you’ll go to the blog home page, which also happens to be our next stop.

Go to the home page of Loren Shure’s blog, and you’ll see several new features:

**Blog post previews.**Quickly scan through the titles and preview text of the 10 most recent posts by your favorite blogger.**Comments icon.**A lot of posts generate interesting discussions among the readers and many of the bloggers. Just click the gray comments icon next to the title to see what’s going on.**Share a post.**Now you can easily share a blog post with your friends on Twitter, Google+, and Facebook or send them an email link.**Left hand navigation.**This area is chock-full of goodness. Want to find Loren’s 2011 post on Intersecting Lines? Click on the Archive link next to Recent Posts. Interested in reading posts on a particular topic? Check out Categories. You’ll also see Recent Comments and helpful links.

Okay, let’s get to the good stuff and see Loren’s post on Intersecting Lines.

In addition to this terrific post by Loren, you’ll notice a couple of other things:

**Links to the next/previous posts.**Bloggers will sometimes do a series of posts on the same topic. If you liked Loren’s Intersecting Lines posts, you’ll be happy to see at top of the page that she followed it up with a part 2.**Cross blog links.**Perhaps you also enjoy reading Cleve’s posts on scientific computing and interesting mathematics; or Steve’s writings on image processing. Just go to the top of page, click All MathWorks Blogs and go to their home page or their most recent post.

What do you think of the new Blogs site? What do you like about it? What’s still missing? Leave us a comment below.

]]>Flappy Bird is currently the most popular file on the File Exchange, unseating Oliver Woodford’s venerable champion export_fig for the first time in years. How much longer until export_fig is back on top? Or does the ‘Bird have real staying power?

]]>