Sign In with your Yahoo! ID or Join Now

Module Reference: Source Modules

Most Pipes begin with a data source. These modules grab data from somewhere on the internet and bring it into your Pipe for processing.

Module Output: items

Feed Auto-Discovery Module

It isn't always obvious where the RSS feeds for a web site are. You might find a link on the home page, but often you have to hunt for it. The Feed Auto-Discovery module was designed to save you the hassle.

Feed Auto-Discovery lets you enter one or more web site URLs into the module. It then examines those pages for information (like link rel tags) about available feeds. If information about more than one feed is found, then multiple items are returned by Feed Auto-Discovery. Because more than one feed can be returned, the output from this module is often fed into a Loop module with a Fetch Feed sub-module.

Example: Run this Pipe | View Source

This example returns up to 10 items from an auto-discovered feed. A single web site URL is fed into the Auto-Discovery module using a URL Input module. The discovered feed items are piped through a Truncate module that lets only the first feed through.

That feed is passed into a Loop module with a Fetch Feed sub-module. Here the actual feed is finally read. Finally, the resulting feed is truncated to 10 items.

Module Output: items

Fetch CSV Module

CSV was originally an acronym for "Comma-Separated Value", a text-file format where each line in the file is a separate record, and individual fields of that record are separated by a comma.

These days, the CSV definition has widened to encompass a wide variation of delimited text formats. The Fetch CSV module was designed to parse these formats.

Many desktop applications, like spreadsheet programs, can output data in CSV or other delimited text formats. It's a common method for data interchange between different computer systems.

The Fetch CSV module has more options than the average Yahoo! Pipes module. To make sense of these, we'll first present a short CSV example file:

		"Works of William Shakespeare"
		"last updated 10/21/2006"
		"Title","Type","Year"
		"A Midsummer Night's Dream",COMEDY,1595
		"All's Well That Ends Well",COMEDY,1602
		"Antony and Cleopatra",TRAGEDY,1606
		"As You Like It",COMEDY,1599
		"Comedy of Errors",COMEDY,1589
		"Coriolanus",TRAGEDY,1607
		"Cymbeline",HISTORY,1609
		"Hamlet",TRAGEDY,1600
		"Henry IV,Part I",HISTORY,1597
		"Henry IV,Part II",HISTORY,1597
		"Henry V",HISTORY,1598
		"Henry VI,Part I",HISTORY,1591
		

Start configuring Fetch CSV by entering the URL of a CSV file. Next, choose the character that separates fields in each row. By default this is a comma, but a list box provides options for tabs, spaces, semi-colons, and the vertical bar ("|") character. If none of these match the delimiter, you can type in the separator character.

Often CSV files have several lines of extraneous data before the field data begins. In our example above, the data we want doesn't really start until the third line. You can configure Fetch CSV to ignore a specified number of rows. In our example, we'd skip the first two rows.

Usually, the first data row of a CSV file contains the column names. The third line in our example contains those column names, so we can tell Fetch CSV to "use rows 3 to 3 as column names". When the CSV data is parsed, each subsequent data line will be assigned to a feed element named "Title", "Type", or "Year".

Sometimes no column names are defined in the CSV file, so Fetch CSV also lets you define those names if necessary. You can also use this option if the columns are defined but you want to rename them.

Example: Run this Pipe | View Source

CSV data is very compact compared to formats like XML, because it doesn't suffer the overhead of all the element tags required to define each record. This compactness has made it popular as a data archiving format, particularly for scientific applications.

Government scientific institutions collect a lot of data, and much of this data is available on the internet. Our example uses some data collected by the U.S. Coast Guard's ship Polar Star. This ship supports scientific and ice escort missions in the arctic and antarctic.

The CSV dataset contains various environmental readings recorded by the Polar Star in the antarctic. This data is somewhat more interesting because it includes geolocation data. This lets us display our final feed on a map.

This sample data is almost textbook CSV - the data is comma-separated, there are no extraneous lines at the beginning of the file, and the first line defines the subsequent column names. We only need to enter the URL into Fetch CSV - no other configuration is required.

To make the data look more like an RSS feed, we run it through a Rename module to create feed elements named title, description, latitude, and longitude.

Next, we use a Regex module to modify some of our newly-created elements. The title is set to the data's date and time, and the description is set to a more human-friendly version of some of the environmental data.

The original CSV data includes readings taken at five-minute intervals - this gives us well over a thousand records. To reduce the size of the data we run the data through a Unique module. Unique eliminates repeated values from a feed. We configured Unique to look at the Date element, reducing the feed to one reading per day.

The feed is finally passed through a Location Extractor module, creating a y:location element and allowing us to display the course of the Polar Star on a map in Pipe Preview.

Module Output: items

Fetch Data Module

There's more data on the web than just RSS and Atom feeds. This module can access and extract data from XML and JSON (Javascript Object Notation) data sources on the web. This data can then be converted into an RSS feed or merged with other data in your Pipe.

To use Fetch Data, first enter the URL of a data file you want. The module will read the file and attempt to parse it. Click on your Fetch Data module (its title bar will turn orange), then click "Refresh" in the Pipes Debugger. If the data file was found and Fetch Data could make sense of it, you'll see a "0" in the Debugger pane. This denotes the root of the XML (or JSON) data hierarchy. You can click on "0" to expand and view the details of this hierarchy.

You may want to extract just a portion of the data. That's what the "Path to item list" field is for. You can zero in on the section of data you want by listing the nested XML elements, separating each with a dot (".").

If your XML file is structured like this:

		<metadata>
		  <idinfo>
		    <citation>
		      <citeinfo>
		      ...
		      </citeinfo>
		    </citation>
		    <descript>
		    </descript>
		    <keywords>
		      <theme>
		      </theme>
		      <theme>
		      </theme>
		      <place>
		      </place>
		      <place>
		      </place>
		      <place>
		        <placekey>Alabama</placekey>
		        <placekey>Alaska</placekey>
		        ...
		        <placekey>West Virginia</placekey>
		        <placekey>Wisconsin</placekey>
		        <placekey>Wyoming</placekey>
		        <placekey>Puerto Rico</placekey>
		        <placekey>U.S. Virgin Islands</placekey>
		      </place>
		    </keywords>
		  </idinfo>
		</metadata>
		

then you can extract just the place elements and their child elements by entering idinfo.keywords.place in the Path to item list field of the Fetch Data module. Separate each element level with a dot ("."). Don't include the top-level element (metadata in the example above).

Example: Run this Pipe | View Source

This example gets current weather data from the U.S. National Weather Service. This data is available as an XML file, so we can use Fetch Feed to grab the data and convert parts of it.

This example illustrates several other modules too. Since the XML file contains geographical information (in this case latitude and longitude of the weather station), we can run it through the Location Extractor module. This gives use the option (when we run the Pipe on the Pipe Preview page) to display the output on a map.

We also use the Location Builder module, along with the String Concatenate module, to build the URL we wire into Fetch Feed. The National Weather Service XML files have a standard naming convention. For example, the file for Richmond, Virginia is:

			http://www.weather.gov/data/current_obs/KRIC.xml
			

where KRIC is a four-character code identifying the weather station. We want to allow the Pipe to show any weather station, so we need to construct the URL from pieces. The piece that changes is provided by the user using a Text Input module.

We also use the Rename module to remap various elements to make the resulting feed more like an RSS feed. The XML file had no title, link, or description elements, but we can create these using Rename.

Finally, we use the Regex module to modify and expand the information in the item.description element. This also shows that the replacement string in Regex can include references to other feed elements. In our example, we are able to embed temperature_string and relative_humidity in the modified description element. Note that you emebed an element reference in the string by surrounding it with ${}, as in: ${relative_humidity}.

Module Output: items

Fetch Feed Module

The Fetch Feed module lets you specify one or more RSS news feeds as input to your Pipe. The module understands feeds in RSS, Atom, and RDF formats.

Feeds contain one or more items, when you add more feed URLs you get a single feed combining all the items from the individual feeds.

To add more feeds, click on the "plus" button at the top of the module, then enter the feed's URL in the new text box that appears. URLs can also be "wired in" to the module from other modules (like the URL Builder).

To delete a feed, just click on the "X" button left of the URL's text box.

Example: Run this Pipe | View Source

The Problem: You read news from several sites, but you're only interested in certain types of stories.

Use the Fetch Feed module to aggregate several feeds together, then use the Filter module to see just the subjects that interest you. This example uses Fetch Feed to grab news from two tech sites, then uses the Filter module to show only those stories that match your interests.

Module Output: items

Find First Site Feed (was Fetch Site Feed) Module

Find First Site Feed uses a web site's auto-discovery information to find an RSS or Atom feed. If multiple feeds are discovered, only the first one is fetched.

With Find First Site Feed, you won't have to search a web site looking for feed URLs. And if a site changes there feed URL in the future, this module can discover the new URL for you (as long as the site updates their auto-discovery links). For sites with only one feed, this module provides a good alternative to the Fetch Feed module.

Also note that not all sites provide auto-discovery links on their web site's home page.

The Find First Site Feed Module provides a simpler alternative to the Feed Auto-Discovery Module. That latter module returns a list of information about all the feeds discovered at a site, but doesn't fetch the feed data itself.

Example: Run this Pipe | View Source

This example uses Find First Site Feed to retrieve the first five items from a feed. The feed is auto-discovered, with the site's URL provided by the user via a URL Input module.

Module Output: items

Flickr Module

Flickr is a popular web site for storing and sharing photographs. The Flickr Pipes module lets you search for photographs by keyword and geographic location.

First specify the number of images you want your search to return. Then enter one or more keywords, like "horses" or "artichokes". You can optionally specify a geographic location the photo is near, like "Chicago, IL" or "Hawaii".

This module returns a lot of data you don't find in a regular RSS feed. A y:flickr tag provides URLs for the Flickr page and Flickr static image file. The page URL is also created as the more RSS-standard link element.

A tags element lists all the Flickr category tags assigned to the photo. And you'll often get geo-location data in a y:location element if location data is available for the photo. This gives you the option to display Flickr results on a map in Pipe Preview.

Example: Run this Pipe | View Source

Our example looks for photos of the Chicago skyline. The Flickr module is set to return up to 200 images tagged with the words "chicago" and "skyline". We'll also ask for photos that are near Chicago, IL.

To further hedge our bets, we run these results through a Filter module. We only want those photos with a valid geo-location, so one filter rule blocks items with bad latitudes. A second rule eliminates any item with a title that doesn't include the word "Chicago".

Module Output: items

Item Builder Module

With the Item Builder module, you can create a single-item data source by assigning values to one or more item attributes. The module lets you assign a value (on the right side of the equal sign) to an attribute (on the left side of the equal sign).

Item Builder's strength is its ability to restructure and rename multiple elements in a feed. When Item Builder is used as a sub-module to the Loop module, the values (right side of equals sign) can be existing attributes of the input feed. These attributes can be reassigned or used to create entirely new attributes.

Example: Run this Pipe | View Source

This example uses a CSV file as input, and uses a Loop module with an Item Builder sub-module to convert it into an RSS feed. Parts of the CSV data are renamed to title, link, and description, corresponding to the expected RSS element names.

Module Output: items

Yahoo! Local Module

Yahoo! Local is a service that lets you search for services in a particular area. You can search for Thai restaurants in San Fransisco, or Shakespeare plays in New York, or an auto mechanic in Mechanicsville, Virginia.

The Yahoo! Pipes module lets you build a custom pipe around this service. You give this module a search string and a location (or wire them into a Text Input or Location Input module), and a location radius between one and 20 miles.

The output feed contains data beyond what you'd expect in an RSS feed. The y:location element gives the latitude and longitude of the item. You'll get rating data (if the service has been rated) including the number of ratings and the average rating. You'll get information on how the item is categorized, and a URL for the business if one's available.

Output from this feed is limited to 20 items.

Example: Run this Pipe | View Source

This example creates a generalized Yahoo! Local service by wiring a Text Input module to the Yahoo! Local module's search term, and a Location Input module to the location field. The search radius is set to 10 miles.

Module Output: items

Fetch Page Module (deprecated)

There's more data on the web than just RSS and Atom feeds. This module fetches the source of a given web site as a string. This data can then be converted into an RSS feed or merged with other data in your Pipe using the Regex module.

To use Fetch Page module, first enter the URL of the site you want. The module will read the page's source as a string. You can choose to only get part of the page by setting the starting point using the 'Cut content from' field and the end point by using the 'to' field. Only the part of the page between these two strings will be returned.

Many pages have repeating elements that you'd like to process separately in the rest of the Pipe. The Fetch Page module allows you to specify a delimiter to cut the strings into a sequence of different items, and works in the same was as the string tokenizer module.

Note on usage: The module will only fetch HTML pages under 200k and the page must also be indexable (I.E. allowed by the site's robots.txt file.) If you do not want your page made available to this module, please add it to your robots.txt file, or add the following tag into the page's <head> element:

<META NAME="ROBOTS" CONTENT="NOINDEX">

Example: Run this Pipe | View Source

This example uses the Fetch Page module to retrieve an HTML page and parse out the local train schedule using the City that was inputted.

Module Output: items

XPATH Fetch Page Module

There's more data on the web than just RSS and Atom feeds. This module fetches the source of a given web site as DOM nodes or a string. This data can then be converted into a RSS/JSON feed or merged with other data in your Pipe using the Regex module, String Builder modules and others that will help achieve what you want.

To use the XPATH Fetch Page module, first enter the URL of the site you want. By default, the module will output the DOM elements as items in the preview pane. You can optionally use the "Emit items as string" checkbox if you need the html as a string.

You can use the "Extract using XPATH" field to fine tune what you need from the HTML Page. For example, if I want all the links in the page I can simply use "//a" to grab all links. If I want all the images in the html I can do "//img". Read more on XPATH. You can also find XPATH statements using firebug to target data that you want in a HTML page. This makes this new Fetch Page more powerful and more inline with todays standards.

Currently this module will extract the page and fix malformed tags using Tidy. You have the option to run the parser using support for HTML4 (by default) or checking the "Use HTML5 parser" checkbox to use the HTML5 parser.

Note on usage: The module will only fetch HTML pages under 1.5MB and the page must also be indexable (I.E. allowed by the site's robots.txt file.) If you do not want your page made available to this module, please add it to your robots.txt file, or add the following tag into the page's <head> element:

<META NAME="ROBOTS" CONTENT="NOINDEX">

Example: Run this Pipe | View Source

This example uses the XPATH Fetch Page module to retrieve an HTML page and parse out data into a RSS feed.

Module Output: items

YQL

YQL complements Pipes by giving you another way of fetching data and feeds into your Pipe.

YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. To find out more about YQL click here.

To use YQL, simply enter a YQL statement (select * from feed where url='http://digg.com/rss/index.xml') into the textarea. To drill down further into the result set you can use either the Pipes Sub-element module or by using projection in a YQL statement. For example: select title from feed where url='http://digg.com/rss/index.xml' returns only the titles from the Digg RSS feed.

The YQL module has 2 viewing modes: Results only or Diagnostics and results. Diagnostics provides additional data such as: count, language type and more.

You can test your query in the YQL console by clicking on the "Try in the console" link. The YQL console provides sample queries and shows what data tables are available to query against.

Here are some interesting queries to get you started:

  • Fetch two rss feeds, Digg and Mixx and sort them by pubDate
    select * from rss where url in ('http://digg.com/rss/index.xml','http://feeds.mixx.com/MixxPopular') | sort(field="pubDate")
  • Find Flickr photos that are tagged "fog" and are in San Francisco
    select * from flickr.photos.info where photo_id in (select id from flickr.photos.search where woe_id in (select woeid from geo.places where text="san francisco, ca") and tags = "fog")

Example: Run this Pipe | View Source

This example uses the YQL module to bring in Flickr data using a YQL statement and converts it into an RSS feed.

Module Output: items

RSS Item Builder Module

With the RSS Item Builder module, you can create a single-item RSS data source by assigning values to one or more RSS attributes.

RSS Item Builder can be used to create a single new RSS item from scratch, or reformat and restructure an existing item into an RSS structure. When the RSS Item Builder is used as a sub-module to the Loop module, the values from each original item can be converted to RSS. Unlike the CreateRSS module, each attribute within the RSS Item Builder can be "wired", so you can take values from other modules.

Example: Run this Pipe | View Source

In this example, we'll use the RSS Item Builder to create a traffic RSS feed for San Francisco.

First we'll use the YQL module to get the traffic data for zip code 94123. We'll use the Loop module and embed a RSS Item Builder sub-module inside by drag and dropping it onto the target area.

We'll map item.Title to Title, item.Description to Description, we'll wire in an email address using the String Builder Module as my Author field, and map item.UpdateDate as my GUID.

We also want the media:content image URL to map to item.ImageUrl so we have that extra meta data that might be read by certain RSS readers.

Since I just want an RSS structure, I'll check off the "emit all results" radio button.

We also want to see the map of where the traffic is in our Description. Using the Regex module, select item.description in the drop down and in the "replace" textbox I use "$" which is regex for appending something to the end of the string and i'll use the String Builder again to create the Image that will be appended.

Check the debugger pane and we're done!

Copyright © 2014 Yahoo! Inc. All rights reserved.