Sign In with your Yahoo! ID or Join Now

Module Reference: Operator Modules

These modules transform and filter data flowing through your Pipe.

Module Input: items | Module Output: number

Count Module

This module counts the number of items in the input feed, and outputs that number.

Example: Run this Pipe | View Source

This example uses Count, together with a Simple Math module, to truncate a feed to half its original size.

First, we pipe our feed into a Split module. This gives us two identical feeds to work with. One feed is piped into Count, which outputs a number: the number of items in the feed. We pipe this value into a Simple Math module and divide it in half.

The other feed (after the split) is piped into a Truncate module, which allows us to limit the number of items in the output feed. Truncate requires a number value (the number of items you want output), so we just wire the Simple Math output into Truncate's numeric field, and we've cut the output feed in half.

Module Input: items | Module Output: items

Filter Module

Sometimes you have more data than you need. The Filter module lets you include or exclude items from a feed. With Filter you create rules that compare feed elements to values you specify.

So, for example, you may create a rule that says "permit items where the item.description contains 'kittens'". Or a rule that says "omit any items where the item.y:published is before yesterday".

A single Filter module can contain multiple rules (just click the + button next to "Rules"). You can choose whether those rules will Permit or Block items that match those rules. Finally, you can choose whether an item must match all the rules, or if it can just match any rule.

Example: Run this Pipe | View Source

This example inputs a feed of science-related news stories, then pipes them through Filter. Two rules are defined: we want to Permit any items where item.description contains the word "mars" or the item.description contains the word "shark". In this example, you express an interest in both space exploration and aquatic creatures. If you modified Filter to match all rules, you would only see items about sharks on Mars.

Module Input: items | Module Output: items

Location Extractor Module

This module examines the input feed, looking for information that indicates a geographic location. If it finds geographic data, the module adds a y:location element to the output feed. This element contains several sub-elements, including:

		lat
		lon
		quality
		country
		state
		city
		street
		postal
		

Some of the sub-elements above may not be included in the resulting feed - this depends on the amount of information that can be derived from the input feed.

Location Extractor can glean location data from some URLs, such as those from maps.yahoo.com, maps.google.com, and mapquest.com.

A wide range of location mark-up is recognized, including GML (Geography Markup Language), Abbreviated GML, W3C Basic Geo, Abbreviated W3C Basic Geo, Simple GeoRSS, Yahoo! Local format, and KML LookAt and Point tags.

Additionally, Location Extractor will recognize "lat", "long", "latitude", and "longitude" as attributes.

If a y:location element is found in the pipe output, the Pipe Preview feature will optionally display the feed on an interactive map.

Example: Run this Pipe | View Source

This example takes comma-separated value (CSV) data and plots the track of Hurricane Katrina. A sample of the weather data shows its format:

			TROPICAL STORM KATRINA, 2005-08-25 05:00:00.0, 26.2, -78.7, 45, 1000
			TROPICAL STORM KATRINA, 2005-08-25 11:00:00.0, 26.2, -79.3, 50, 997
			HURRICANE KATRINA, 2005-08-25 17:00:00.0, 26.1, -79.9, 65, 985
			HURRICANE KATRINA, 2005-08-25 23:00:00.0, 25.5, -80.7, 65, 984
			HURRICANE KATRINA, 2005-08-26 05:00:00.0, 25.3, -81.5, 65, 987
			

This data includes the storm name, a date/time, latitude, longitude, wind speed, and pressure. The Fetch CSV module can read this data and convert it to a useable feed.

Fetch CSV lets us name the elements of the output feed. We make sure to name them latitude and longitude, two element names Location Extractor recognizes.

We then run the feed through a Filter module, allowing through the pipe only those items with item.title values we expect. This filters out any unrecognized data we might encounter.

Finally, we pipe the data through Location Extractor, which adds the y:location element to the output. When we run this pipe via Pipe Preview, we see a map of the hurricane's track.

Module Input: items | Module Output: items

Loop Module

The Loop module introduces the idea of sub-modules to Yahoo! Pipes. You can insert another module inside the Loop module. When you connect a data feed into the top of the Loop module, the sub-module is run once for each item in the input feed.

To add a sub-module to a Loop module, select the module from the toolbar. If the selected module is a valid sub-module, a dotted black border surrounds the "Drop Module or Pipe Here" section of Loop. (Most modules, with the exception of User inputs and Operators can be sub-modules.)

Drag the module onto the Loop module. When the dotted border turns to solid red, release the mouse button and the sub-module is installed inside Loop.

Some sub-modules (Term Extractor, String Regex, etc.) have a connector for input. When these modules become sub-modules, the drop-down list at the top of Loop (For each ... in input feed) is populated with fields from Loop's input feed. Select one of these fields to set the input for your sub-module.

Loop has two options that handle what data passes out of the Loop. The first is "emit results". If this option is selected, only data output from the sub-module is included in the output.

If "assign results to" is selected, all the data from the original input is included in the output, and output from the sub-module is assigned to a new or existing data element.

The Loop module is a general-purpose replacement for the now-deprecated For Each: Annotate and For Each: Replace modules.

Example: Run this Pipe | View Source

This example uses the Loop module to combine elements of an XML data stream to create the title element for an RSS feed. By assigning the output of a String Builder sub-module to a new element (in this case title) we can add elements to the data stream.

Our input data is from a government emergency warning system in California. It's plain old XML, not RSS, but with a little help, we can convert it. The String Builder sub-module combines two elements, headline and severity, to create our new title.

We select the "assign results to" option and type in "title". We also have the option to select an existing data element from the list box.

After the Loop, we pass the data through a Rename module, where we convert the expires field into a pubDate field expected for RSS. Luckily, a description field already existed in the input data, so no other conversion is needed.

Module Input: items | Module Output: items

Regex Module

The Regex module modifies fields in an RSS feed using regular expressions, a powerful type of pattern matching. Think of it as search-and-replace on steriods.

You can define multiple Regex rules. Each has the general format: "In [field] replace [regex pattern] with [text]". Entire books have been written about regular expressions, so we'll only discuss the basics here. You can read about them in more depth here.

Example: Run this Pipe | View Source

The Internet Movie Database (imdb.com) provides a daily feed of movie stars birthdays (http://rss.imdb.com/daily/born/). The description field looks like this:

			In 1959, Hugh Laurie was born on this date in Oxford, Oxfordshire, England, UK
			

The feed seems to be in somewhat random order. Let's say you want to order the feed from youngest to oldest. Let's do that by using regular expressions to extract the birth year from the description and add it as its own field: birth_year. Then we'll just sort that.

The first step, after fetching the IMDB feed, is to create a new field we'll call birth_year. We use the Rename module, and copy item.description into our new item.birth_year field.

Initially, our birth_year field contains the entire text of our description. This is where Regex does its magic. We know that each description has the format:

			In [year], [person] was born on this date in [location]
			

We need to extract that year from the pattern. In our Regex rule, we first select the field we want to modify, item.birth_year. Next we define a regular expression pattern. It looks like this:

			^In\s+(\d{4}),.+
			

If you've never encountered a regex before, this may look like gibberish. But it becomes easier to understand if we break it down into parts.

The "^", when it's the first character in a regex, says that what follows it should be matched at the beginning of the text we're searching. Next the "In" is just what it looks like, the word "In". So far, our regex will match any text where the characters "In" appear at the start of the text.

Next, we come to "\s+". The "\s" part of this is what's known as a character class. Character classes are shorthands that represent things like "any digit" (\d), or "any word character" (\w) , or "any whitespace character" (\s). The plus sign following "\s" modifies whatever it follows: "+" means "one or more". So "\s+" means "one or more whitespace character".

So far, our pattern "^In\s+" says: match any string that starts with "In", followed by one or more blanks.

The second half of our regex pattern is "(\d{4}),.+". The parenthesis help group parts of a regex for later reference. More on that in a moment. The "\d" is a character class meaning "any digit character". The "{4}" is similar to the earlier plus sign. It modifies whatever precedes it. In this case "\d{4}" means "exactly four digit characters". Then there's the comma. It just means "match a comma". That's followed by ".+". The period character is special. It's a character class that means "any character", so ".+" means "match one or more of any character at all".

So if we put it all together, the regex matches any string that begins with "In" followed by one or more spaces, followed by four digits, followed by a comma, and then followed by anything.

So now that we've made a match, we need to do something with it. The Regex module rules have the form: In [element] replace [regex pattern] with [value]. So what goes in that value? Since our regex pattern matches the entire item.birth_year string, whatever value we specify will replace it completely.

If we used the value "foobar", then every item's birth_year would be set to just "foobar". What we use instead is "$1". And this brings us back to those parenthesis we saw in the regex. For every matched parenthesis in the regex, a numbered reference is created. That reference, preceded by a dollar sign, can be placed in the rule's value field. In our case, we only have one set of parenthesis, so we only have $1. It's value is the item.birth_year text matched by the part of the regex inside the parenthesis. That was "\d{4}", or "exactly four digits". That is the year value we're looking for.

So after passing through the Regex module, item.birth_year contains only those four-digits matched by $1.

Now we can just pass the feed through a Sort module, sort on descending item.birth_year, and we're done.

Module Input: items | Module Output: items

Rename Module

This module can rename elements of the input feed. There are several cases when this is useful, for example when the input data is not in RSS format (e.g., elements are not named title, link, description, etc.) and you want to output it as RSS, or when the input data contains geographic data but their element names aren't recognized by the Location Extractor module.

You rename an element by creating a mapping between the original name and a new element name. Additional mappings can be created by clicking the + button next to the "Mappings" label.

To define a mapping rule, first select an existing element name from the drop-down list provided. Next you need to select the type of mapping. The rename option replaces the original element with a new element name you provide. The copy as option leaves the original element intact, but creates a duplicate of it with the new name. Finally, you enter the new name of the element.

Example: Run this Pipe | View Source

Many city and local governments now make their crime statistics available on-line. This data is provided in a wide range of formats and with varying levels of detail. The city of Washington, D.C. maintains an XML file of their crime data. This example uses the Rename module to convert this XML into a valid RSS feed.

First, we fetch the XML crime data and pipe it into a Filter module. The Washington D.C. statistic include a lot of detail, including where the crime was committed (including latitude and longitude, though it's buried deep in the XML), and what type of crime. For our example, we'll just extract the bicycle thefts from the file by filtering on the item.content.dcst:ReportedCrime.dcst:method element.

Next, we pipe this subset of data into the Rename module. A valid RSS feed will have title, link, and description elements, so we write rules to rename appropriate XML elements to create these.

Since we have geolocation data buried in the XML, we'll also create two copy as rules to create top-level latidude and longitude elements. This let's us pipe the output into a Location Extractor module to create a y:location element.

When we run this pipe, the geographic data is recognized by Pipe Preview, and we see a map of all the bicycle thefts in Washington, D.C.

Module Input: items | Module Output: items

Reverse Module

Sometimes feed are ordered - by publication date, alphabetically, or by other criteria. And sometimes that order is exactly opposite of what you want. The Reverse module provides a simple way to fix that, by flipping the order of all items in a data feed.

Example: Run this Pipe | View Source

Sometimes a feed is ordered, but it's not easily sortable. Turner Classic Movies (TCM) has an RSS feed of the movies showing on their channel over the next 24 hours. The feed is ordered chronologically, starting with the film currently showing. But there's no attribute, like pubDate, that would let you list the movies in reverse-chronological order using the Sort module.

Module Input: items | Module Output: items

Sort Module

This module sorts a feed by any item element, such as title or description. You can sort items in either ascending or descending order.

Example: Run this Pipe | View Source

The Internet Movie Database provides several RSS feeds - one is a daily list of celebrity birthdays. The list items is in no particular order, making it harder to look for a particular name.

We can use the Sort module to alphabetize the list - each item title contains the person's name and their age (if they're still alive). But there's a hitch: the names are formatted like this:

John Smith (27)

If we alphabetize this list, they're all in first name order - not quite what we want! Luckily, we have the Regex module. With a little pattern matching magic, we can rearrange this name to look like this:

Smith, John (27)

Now we just need to pass the data through the Sort module, sort by item.title in ascending order, and we have a list arranged by last name.

If we wanted to retain the original title format, we could have (before running through Regex) used the Rename module to create a copy of the title (perhaps called sorted_title), then used Regex to modify and Sort to sort the new sorted_title element.

Module Input: items | Module Output: two identical lists of items

Split Module

This module receives an RSS feed and splits it into two identical output feeds. Use split when you want to perform different operations on data from the same feed.

The Union module is the reverse of Split, it merges multiple input feeds into a single combined feed.

Example: Run this Pipe | View Source

This example splits a Yahoo! Health News feed in two, using the Split module. Each feed is then sent through a Filter module. The first Filter looks for stories on weight, while the second Filter looks for stories on drugs. After filtering, each feed is then sent through its own Regex module, which we use to annotate each item title with its corresponding category. So a story titled "Overweight kids face widespread stigma (AP)" becomes "[Weight] Overweight kids face widespread stigma (AP)".

Finally, we merge the two feeds together using a Union module, giving us a single, categorized, news feed.

Module Input: items | Module Output: items

Sub-Element Module

Sometimes the data you need from a feed is buried deep in its hierarchy. You need to extract just those select sub-elements from the feed. This is what the Sub-element module is for.

Let's suppose we have the Sonnets of William Shakespeare rendered as XML, with the structure as shown in this (abbreviated) example.

		<SONNET>
		  <AUTHOR>William Shakespeare</AUTHOR>
		  <TITLE>Sonnet 21</TITLE>
		  <STANZA id="st1"> 
		    <VERSE>So is it not with me as with that Muse</VERSE>
		    <VERSE>Stirr'd by a painted beauty to his verse,</VERSE>
		    ...
		  </STANZA>
		  <STANZA id="st2"> 
		  ...
		  </STANZA>
		  ...
		</SONNET>
		

With the Sub-element module, just the verses from each stanza can be extracted, and all the XML levels above them (STANZA, TITLE, AUTHOR, and SONNET) can be discarded.

When a data feed is wired into Sub-element, its list box is populated with all the available sub-elements. In the example above, you would select item.STANZA.VERSE as the path to your sub-elements. Only those VERSE elements (and any child elements) are included in the resulting feed.

Example: Run this Pipe | View Source

The example uses a feed described above and extracts each VERSE sub-element. To make the resulting data more readable in the Pipes Preview, we also pass the data through a Rename module to copy the verse into a title element.

Module Input: items | Module Output: items

Tail Module

This module truncates a feed to the last N items, where N is a number you specify. Contrast this with the Truncate module, which limits the output to the first N items.

Example: Run this Pipe | View Source

This example extracts National Weather Service short-term forecasts to the last 5 reports issued. The issue dates are included near the beginning of each description element, so before using the Tail module we first use Sort to force the feed into ascending date order.

Module Input: items | Module Output: items

Truncate Module

This module returns a specified number of items from the top of a feed. This lets you limit the number of items in the output feed.

Contrast this with the Tail module, which also limits the number of feed items, but returns items from the bottom of the feed.

Example: Run this Pipe | View Source

Some feeds have more items than you need. If a news feed has twenty items, but you only want the five most recent articles, then this example can help.

First, we fetch the feed (in this case the Yahoo! World News feed). Since the feed may or may not be in any particular order, we pipe it through a Sort module, and sort on the item.y:published.utime element. (The utime element of any datetime data represents the date and time as the number of seconds that have elapsed since the beginning of the "epoch" - defined as jan 1, 1970.) By sorting in descending order, the most recent articles move to the top of the feed.

Then we just just pipe the output into a Truncate module. We wire a Number Input module into Truncate's value field to make it easy for the user to configure the number of fresh news items they want. The default is five items.

Module Input: up to 5 different sources of items | Module Output: items

Union Module

This module merges up to five separate sources of items into a single list of items.

Example: Run this Pipe | View Source

This example shows how you can process multiple feeds separately, then merge them together at the end using the Union module.

Let's say we're interested in squids. We want a feed that aggregates information about squids from our favorite sources: Wikipedia and National Geographic.

To get squid references out of wikipedia, we use the Yahoo! Search source module, and add a site restriction for http://en.wikipedia.org. We don't want too many articles, so we pipe this feed through a Truncate module, taking only the top two articles.

Next, we want articles from the National Geographic web site, but we only want recent articles. They have an RSS feed, so we use a Fetch Feed module to grab it. We pipe this into a Filter module that only returns items with "squid" in their description.

Finally, we pipe both feeds (Wikipedia and National Geographic) into a Union module, merging both into a single, combined, all-squid feed. If we find other squid sources (perhaps a Google Base search of recipes for calamari) we can pipe them into the same Union module to include them as well.

Module Input: items | Module Output: items

Unique Module

This module removes items that contain duplicate strings. You select the element to filter on, and Unique removes the duplicates - if the original feed has five items with the same title, you can configure Unique so only one of these items is included in the output feed.

Example: Run this Pipe | View Source

Some feeds contain multiple items on the same subject. In a politcal news feed, for example, we may not want to see all nine items on a particular politcal race.

This example uses the Content Analysis and Unique modules to whittle down the feed and eliminate similar stories. The Content Analysis module evaluates a feed and tries to reduce the gist of the item to a few words. We first run our political news feed through Content Analysis, and it creates a y:content_analysis element containing those key words.

Then we run the feed through Unique, and configure it to remove any duplicate y:content_analysis items. What's left is a smaller feed with fewer repetitive articles.

Module Input: items | Module Output: items

Web Service Module

This module lets you send Yahoo! Pipes data to an external web service for special processing.

You may have special data processing needs that Yahoo! Pipes doesn't provide. The Web Service module will send the Pipes data, as JSON data, into an external web service you specify.

If you're writing a custom web service, it must receive data via HTTP POST in JSON format. Here's an example of a minimal data feed in the format your web service can expect:

		data={
		    "items":[
		       {
		           "title": "First Title",
		           "link": "http://example.com/first",
		           "description": "First Description"
		       },
		       {
		           "title": "Last Title",
		           "link": "http://example.com/last",
		           "description": "Last Description"
		       }
		       ]
		}
		

Your custom web service should send data back in a similar JSON format (omitting the "data=", of course), or in RSS XML format.

Example: Run this Pipe | View Source

This example runs an input feed through a very simple web service. The web service appends the string " (This text was added by the external Web Service)" to the end of each title element, and returns the data to Yahoo! Pipes in JSON format. The sample code (in Java) for this web service is here.

Module Input: items | Module Output: items

Create RSS Module

This operator module makes it easy to convert an entire list of items into an RSS stream when the input data is not in RSS format, e.g., the fields are not named correctly for RSS display and output. Both the common required fields can be set, as well as the optional, but frequently used, media extensions.

To rename non-RSS elements to a RSS structure, simply select an existing element name from the drop-down list provided.

Example: Run this Pipe | View Source

This example uses the YQL module to get the top music artists of the week. Because the data isn't in a RSS structure, we'll use the Create RSS Module to convert the data to RSS.

This is all done easily by mapping the item.element to the RSS Element you want. For example, I want item.name as my Title, item.ItemInfo.ChartPosition as my Description field, and item.website as my Link. Simply find the item in the dropdown next to the RSS Item you want it defined as.

If you want to further customize your fields, using the Regex Module is the easiest way. For example, in my description I want to prepend this text: "This weeks Chart Position: " in front the text currently being rendered as the description field. I first find the item.description field and I use "^" in the "replace" text area to signify that I want to start at the beginning of the string. In the "with" text area I enter "This weeks Chart Position: ". Check the debugger pane that it came out correctly and we're done!

Copyright © 2014 Yahoo! Inc. All rights reserved.