Using gRSShopper feed filter rules

Each harvested resources is saved as a 'Link'. Rules are used to change Link values (such as the link 'title', the link 'category', etc) and use harvested Link data to create a new 'Post'.

Rule

Rules are composed of conditions and actions. If the condition is satisfied, the rule is 'triggered', that is, the actions are performed. The general syntax is this:

condition => action;

The symbol => is used to separate the condition and the action. The semi-colon is used to terminate the rule.

A series of rules can be evaluated. Rules are evaluated in the order tey are listed. All rules are evaluated. However, if a rule begind with the word 'else', it is evaluated only if no previous rule has been triggered. For example:

condition => action;
else condition2 => action2;

In this example, the first rule is evaluated. Then, the second rule is evaluated only if the first rule was not triggered.

Condition

A condition queries the data in a harvested link to see whether it matches a given value. If the value is matched, the condition is 'true', and the rule is triggered. If the value is not matched, the condition is 'false', and the rule is not triggered. For example:

title = A Tale of Two Cities

In this case, the condition is true if the title is the string 'A Tale of Two Cities'. By default, rules are not case sensitive, so the condition is also true if the title is the string 'a tale of two cities'. It is *not* true if the title is 'A tale', or anything else.

title ~ A tale

In this case, the rule is true if the title *contains* the string 'A tale'. Thus, the condition is true if the title is 'A tale of two cities' and also if the title is 'A tale of the tape', but it is not true if the title is 'Great Expectations'.

Disjunction

A rule can evaluate several data elements (known as 'fields') at one. For example, the following rule will be true if either the title or the description contains the phgrase 'A tale':

title|description ~ A tale

Notice that the pipe character '|' is used to indicate a disjunction, that is, to indicate that the rule is true if *either* 'title' or 'description' cvontain the phrase 'A tale'.

Conjunction

A rule can contain more than one conjunct, such that *all* conjuncts much be true for the rule to be true. Conjuncts are indicated with an ampersand. Hence, for example:

title~Canada & description~Lakes

This rule is true only if both the title contains the phrase 'Canada' *and* the description contains the phrase 'Lakes'

Negation

A negation can be expressed using the bang character: ! For example:

title ~ !Canada

This condition is true only if the title does not contain the phrase Canada.

Mixing and Matching conditions

In any rule, negations are evaluated first, disjunctions are evaluated second, and conjunctions are evaluated third. This is reflected in the syntax of complex rules. For example:

title|description ~ France|Canada & link ~ !twitter

Brackets are not used; if more complex formulations are required, additional rules should be added.

Actions

An action sets the value of a Link data element (or 'field') as desired. For example:

=> category=France

This condition sets the value of link_category to 'France'.

A sequence of actions may be performed. This is indicated with the comma operator. For example:

=> category=France,title=Trips

This sequence sets the value of link_category and link_title

Autopost

The 'autopost' function takes the values of 'Link' and uses them to create a new 'Post'. This is useful when creating pages containing multiple posts; only those links that execute the 'autopost' action will appear on such a page.

Extract

The extract function selects a substring from the value of the string. Extract is expressed using three parameters: => extract(field,start,finish) - 'field' is the name of the field you want to extract from (eg. 'link_title', or just 'title') - 'start' is a string that delimits the beginning of whatever text you want extracted; use ^ to indicate the beginning of the text - 'finish' is a string that delimits the end of whatever text you want extracted; use $ to indicate the very end of the text

For example, suppose you want to extract proper URLs from a feed that harvest links from Google News Alerts. Here is what Google sends you:

http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNEqTrLJKkuQiCMkIAAlEO7lBXLGuw&url=http://www.thedp.com/article/2012/10/brian-collopy-coursera-and-the-competition
and here is what you want:
http://www.thedp.com/article/2012/10/brian-collopy-coursera-and-the-competition
In other words, you want the substring between 'url=' and the end of the string. So you would create an extract command as follows:

=> extract(link,url=,$)

Empty conditions

The action in a rule expressed with an empty condition is always performed. Thus, for example, if a feed has a rule:

=> autopost;

every Link harvested from the feed will be converted into a post. Note that you should execute autopost only after you've made any changes to the data that you want to make.

Empty conditions with 'else'

The following is a common sequence:

title|description ~ Moncton => category=City,autopost;
else => autopost;

This sequence will evaluate the title and description, and if either contains the string 'Moncton', the category will be set to 'City' and the Link autoposted. Otherwise (if the first rule is *not* triggered) the Link will be autoposted as is (probably with a different category value).