Using gRSShopper feed filter rules
Each harvested resources is saved as a 'Link'. Rules are used to change Link values (such as the link 'title', the link 'category', etc) and use harvested Link data to create a new 'Post'.
Rule
Rules are composed of conditions and actions. If the condition is satisfied, the rule is 'triggered', that is, the actions are performed. The general syntax is this:
condition => action;
The symbol => is used to separate the condition and the action. The semi-colon is used to terminate the rule.
A series of rules can be evaluated. Rules are evaluated in the order tey are listed. All rules are evaluated. However, if a rule begind with the word 'else', it is evaluated only if no previous rule has been triggered. For example:
condition => action;
else condition2 => action2;
In this example, the first rule is evaluated. Then, the second rule is evaluated only if the first rule was not triggered.
Condition
A condition queries the data in a harvested link to see whether it matches a given value. If the value is matched, the condition is 'true', and the rule is triggered. If the value is not matched, the condition is 'false', and the rule is not triggered. For example:
title = A Tale of Two Cities
In this case, the condition is true if the title is the string 'A Tale of Two Cities'. By default, rules are not case sensitive, so the condition is also true if the title is the string 'a tale of two cities'. It is *not* true if the title is 'A tale', or anything else.
title ~ A tale
In this case, the rule is true if the title *contains* the string 'A tale'. Thus, the condition is true if the title is 'A tale of two cities' and also if the title is 'A tale of the tape', but it is not true if the title is 'Great Expectations'.
Disjunction
A rule can evaluate several data elements (known as 'fields') at one. For example, the following rule will be true if either the title or the description contains the phgrase 'A tale':
title|description ~ A tale
Notice that the pipe character '|' is used to indicate a disjunction, that is, to indicate that the rule is true if *either* 'title' or 'description' cvontain the phrase 'A tale'.
Conjunction
A rule can contain more than one conjunct, such that *all* conjuncts much be true for the rule to be true. Conjuncts are indicated with an ampersand. Hence, for example:
title~Canada & description~Lakes
This rule is true only if both the title contains the phrase 'Canada' *and* the description contains the phrase 'Lakes'
Negation
A negation can be expressed using the bang character: ! For example:
title ~ !Canada
This condition is true only if the title does not contain the phrase Canada.
Mixing and Matching conditions
In any rule, negations are evaluated first, disjunctions are evaluated second, and conjunctions are evaluated third. This is reflected in the syntax of complex rules. For example:
title|description ~ France|Canada & link ~ !twitter
Brackets are not used; if more complex formulations are required, additional rules should be added.
Actions
An action sets the value of a Link data element (or 'field') as desired. For example:
=> category=France
This condition sets the value of link_category to 'France'.
A sequence of actions may be performed. This is indicated with the comma operator. For example:
=> category=France,title=Trips
This sequence sets the value of link_category and link_title
Autopost
The 'autopost' function takes the values of 'Link' and uses them to create a new 'Post'. This is useful when creating pages containing multiple posts; only those links that execute the 'autopost' action will appear on such a page.
Extract
The extract function selects a substring from the value of the string. Extract is expressed using three parameters: => extract(field,start,finish) - 'field' is the name of the field you want to extract from (eg. 'link_title', or just 'title') - 'start' is a string that delimits the beginning of whatever text you want extracted; use ^ to indicate the beginning of the text - 'finish' is a string that delimits the end of whatever text you want extracted; use $ to indicate the very end of the text
For example, suppose you want to extract proper URLs from a feed that harvest links from Google News Alerts. Here is what Google sends you:
-
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNEqTrLJKkuQiCMkIAAlEO7lBXLGuw&url=http://www.thedp.com/article/2012/10/brian-collopy-coursera-and-the-competition
-
http://www.thedp.com/article/2012/10/brian-collopy-coursera-and-the-competition
=> extract(link,url=,$)
Empty conditions
The action in a rule expressed with an empty condition is always performed. Thus, for example, if a feed has a rule:
=> autopost;
every Link harvested from the feed will be converted into a post. Note that you should execute autopost only after you've made any changes to the data that you want to make.
Empty conditions with 'else'
The following is a common sequence:
title|description ~ Moncton => category=City,autopost;
else => autopost;
This sequence will evaluate the title and description, and if either contains the string 'Moncton', the category will be set to 'City' and the Link autoposted. Otherwise (if the first rule is *not* triggered) the Link will be autoposted as is (probably with a different category value).
Contents
[gRSShopper Home Page]
[About gRSShopper]
[Demonstration Site]
[Detailed Description]
[Installation Instructions]
[Data Types]
[Page Editor]
[Page Commands]
[Feed Rules]
[Languages]
[Create a MySQL Database]
[How-To Videos]
[gRSShopper Source Code]