Starting to Think About a Yahoo Pipes Code Generator

Following a marathon session demoing Yahoo Pipes yesterday (the slides I didn’t really use but pretty much covered are available here) I thought I’d start to have a look at what would be involved in generating a Pipes2PHP, Pipes2Py, or Pipes2JS conversion tool as I’ve alluded to before (What Happens If Yahoo! Pipes Dies?)…

So how are pipes represented within the Yahoo Pipes environment? With a little bit of digging around using the Firebug extension to Firefox, we can inspect the Javascript object representation of a pipe (that is, the thing that is used to represent the pipework and that gets saved to the server whenever we save a pipe).

So to start, let’s look at the following simple pipe:

SImple pipe

Here’s a Firebug view showing the path (editor.pipe.definition should be: editor.pipe.working) to the representation of a pipe:

And here’s what we see being passed to the Yahoo pipes server when the pipe is saved…

Here’s how it looks as a Javascript object:

"modules":[{"type":"fetch","id":"sw-502","conf":{"URL":{"value":"http://writetoreply.com/feed","type":"url"}}},{"type":"output","id":"_OUTPUT","conf":{}},{"type":"filter","id":"sw-513","conf":{"MODE":{"type":"text","value":"permit"},"COMBINE":{"type":"text","value":"and"},"RULE":[{"field":{"value":"description","type":"text"},"op":{"type":"text","value":"contains"},"value":{"value":"the","type":"text"}}]}}],"terminaldata":[],"wires":[{"id":"_w3","src":{"id":"_OUTPUT","moduleid":"sw-502"},"tgt":{"id":"_INPUT","moduleid":"sw-513"}},{"id":"_w6","src":{"id":"_OUTPUT","moduleid":"sw-513"},"tgt":{"id":"_INPUT","moduleid":"_OUTPUT"}}

Let’s try to pick that apart a little… firstly, all the modules are defined. Here’s the Fetch module:

{
 "type":"fetch",
 "id":"sw-502",
 "conf":{
  "URL":{
   "value":"http://writetoreply.com/feed",
   "type":"url"
  }
 }
}

The output module:

{
 "type":"output",
 "id":"_OUTPUT",
 "conf":{}
}

The filter module:

{
 "type":"filter",
 "id":"sw-513",
 "conf":{
  "MODE":{"type":"text","value":"permit"},
  "COMBINE":{"type":"text","value":"and"},
  "RULE":[{
   "field":{"value":"description","type":"text"},
   "op":{"type":"text","value":"contains"},
   "value":{"value":"the","type":"text"}
  }]
 }
}

Each of these blocks (that is, modules) has a unique id. The wires then specify how these modules are connected.

So here’s the wire that connects the output of the fetch block to the input of the filter module:

{
 "id":"_w3",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-502"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"sw-513"
 }
}

And here we connect the output of the filter to the input of the output block:

{
 "id":"_w6",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-513"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"_OUTPUT"
 }
}

***UPDATE – I’m not sure if we also need to look at the terminaldata information. I seem to have lost sight of where the multiple “RULES” that might appear inside a block are described…? Ah…. editor.pipe.module_info? Hmm, not – that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block?)***

*** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

So what would a code generator need to do? I’m guessing one way would be to do something like this…

  • for each module, create an equivalent function by populating a templated function with the appropriate arguments e.g.
    f_sw-502(){ returnfetchURL(“http://writetoreply.org/feed”) }
  • for each wire, do something along the lines of f_sw-513(f_sw-502()); it’s been a long day, so I’m not sure how to deal with modules that have multiple inputs? But this is just the start, right…? (If anyone else is now intrigued enough to start thinking about building a code generator from a pipes representation, please let me know…;-)

As to why this approach might be useful?
– saving a copy of the Javascript representation of a pipe gives us an archival copy of the algorithm, albeit in a javascripty objecty way…
– if we have a code generator, we can use Yahoo Pipes as a rapid prototyping tool to create code that can be locally hosted.

PS a question that was raised a couple of times in the session yesterday related to whether or not Yahoo pipes can be run behind a corporate firewall. I don’t think it can, but does anyone know for sure? Is there a commercial offering available, for example, so corporate folk can run their own instance of pipes in the privacy of their own network?

PPS here’s a handy trick… when in a Yahoo pipes page, pop up the description of the pipe with this javascript call in a Firefox location bar:
javascript:alert(editor.pipe.definition.toSource());

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

15 thoughts on “Starting to Think About a Yahoo Pipes Code Generator”

  1. The neatest way of achieving this to my mind would be to use Pipes to build it!
    – Get the JSON representation
    – Pass it through pipe(s) to generate single column CSV of the code.
    Needs more thought.

  2. Informative as always.
    Here’s a pipe I use to extract the URLs used in Fetch Feed modules from a pipe.
    http://pipes.yahoo.com/pipes/pipe.edit?_id=7001dd696969da25c150550bef16df39

    I didn’t know this until a few months ago, but there are actually 2 possible versions of a pipe, the “working” version and the “published” version. After a pipe has been published all changes to the pipe are reflected in the working version, but the published version stays unchanged until it is published again. The user sees the working version and everybody else sees the published version. That can be a real headache when trying to debug someone else’s pipe.

  3. Superb, and the stub of a really valuable piece of work, because as you say the threat of Pipes disappearing is a risk we need to be aware of and, if possible, mitigate. It would be great if we could settle on a language (shall we say PHP, since it’s so widely used?) and divvy up the modules for people to build and keep a “shadow Pipes” in GitHub or somewhere. Pipes will always be valuable despite the risks of using it – the hosting, sharing and authoring aspects of it are all strong selling points; but it would probably attract still more developers if this fall-back existed, and if the paradigm were extended so people could use it simply as an authoring/prototyping tool.

    1. Hi Jeremy – thanks for the supportive comment:-)

      PHP and Python could both be great starting points… I’ll post a few more doodles looking at other blocks to try and come up with some sort of style guide for describing the modules, their representation, what functionality we might expect, and then maybe migrate to a project wiki somewhere? If you’d like to get involved, and maybe even set up a github repository (I can cope with gists but am still not a user of code versioning or project management tools – it’s just me, my text editor, browser and dozens of messy directories all over my laptop…)

  4. @hapdaniel oooh, that PIPE.working is new to me… are there any other tricks like that?:-)

    I know this is unlikely, but is there a way of grabbing the representation of a pipe from within the pipe….?

    …GULP – I just looked properly at your follow up… so we can look up a pipe’s internals, within the Pipes environment:-)

    Have you blogged that anywhere? If not, do you fancy doing a guest post here about how we can publish a description of a pipe from within Pipes?

  5. No I haven’t blogged it. Please feel free to have a go yourself.

    As I said in my initials comment, I’ve mainly made use of the PIPE.working to extract URLs from a Fetch Feed. Quite often people build huge Fetch Feed modules and then have a question on the message boards where my reply is for them to put the feeds in a Google spreadsheet. With my pipe they can use CSV output, save the CSV and then import into the spreadsheet.

    BTW, don’t forget that Pipes can output PHP as well as JSON. I don’t know whether there are any problems with the PHP output or not.

  6. I’ve started a project to extract and run Yahoo pipe definitions using Python. It uses pipelines of generators to closely match the original pipelines and so far the concept seems to work well. I’ve coded a small number of modules so far, and there’s lots more to do.

    The code is here: http://github.com/ggaughan/pipe2py

  7. Just to say that I’ve had to make a change to the pipe I mentioned in an earlier comment. The output from YQL had changed which resulted in the YQL output becoming a single string. The pipe now uses YQL and a custom open table to process a pipe’s run page JSON.

Comments are closed.