Advanced Chartlet usage
You are now familiar with Chartlet and you want to get data from some place for which you didn't find an existing chartlet. This section is for you.
First you need to switch to Advanced mode. From the chartlets window, click on the Configure button. This opens the preferences window.


The Display help field is used to show/hide the question marks. When enabled, most windows of the Chartlet extension display question marks at several places. Clicking this icon opens a contextual help window. You can navigate the help system by following the hyperlinks.
Maximum depth value is used to limit chartlet crawling when following next-like links.
Check the Advanced mode field to get some more buttons in most windows.
Back to the chartlets window, you now have additional Create and Create copy buttons.

Create copy (you need to select a chartlet in the list to see the button) is used to copy the selected chartlet to a new one. The new chartlet is the same as the previous one, except that the name has changed (Copy of ...) and that you own the copy, which means you can modify low-level settings of the chartlet. We will describe below how to make those modifications.
Click on the Create button to create a new chartlet from scratch.

You are requested to select a mode for your new chartlet. Note that the mode of a chartlet cannot be changed afterwards. Available modes are:
- Direct: Extract a numeric value displayed in a Web page. For instance a stock value or a temperature.
- Rank: Extract a numeric value as the rank of an item in a list (note that the list may be displayed on several pages). For instance the rank of a site on a search engine.
- Popularity: Extract a numeric value as the number of occurences of a given expression in one or several Web pages. For instance, how many times a famous singer is mentioned on a people site.

The chartlet window opens in edit mode. Note that you can switch between edit and production display modes using the Edit and Edit done buttons. The chartlet window in edit mode is divided into 3 sections:
- Chartlet information: allows to rename the chartlet and provide a description.
- Chartlet variables: defines variables to be used in channels.
- Collect information: defines XPath and regular expressions used to extract data.
Collect information is used to define what url to start extracting from, what items in the page are to be taken into consideration, what link is to be followed to go to next page, and the maximum number of pages to crawl (note that this last value is limited globally using Maximum depth preference option). The exact meaning of each field depends on the mode you have chosen to create the chartlet.
Prerequisites
If you want to define your own chartlets, you need to have at least a basic knowledge of XPath and regular expressions.
XPath is a language for querying data within a tree structure, like a HTML document. For instance, the expression /html/body//a will extract all <A> tags (links to other Web resource) in the document.
Although XPath might be complex for some queries, most cases are simple to handle. The Chartlet extension embeds tools for helping at creating the XPath expressions. From the main window of the browser, right-clicking on an HTML element opens the context menu. Choose Chartlet tools/Copy XPath to copy an XPath expression of the element to the clipboard. You can paste the XPath to the collect field with CTRL+V key. It is also possible to test XPath expressions selecting HTML elements from the entry Chartlet tools/Apply XPath in the context menu. Matching elements will be highlighted in the window. Reload the page to remove the highlights.
See the XPath specifications.
A regular expression is a useful tool to recognize or extract data from a text. It is defined by a pattern which describes what part of a string should be captured.
For instance the regular expression x(\d+)= applied to the text 3x5=15 will extract value 5.
Most characters in a regular expression match the same characters in the given text. But some characters have a special meaning in the regular expression. To match the same character, you must escape it by adding \ just before the character. For instance to extract the price from the text Price is 12.95$, you should use the regular expression Price is (\d+\.\d+)\$.
The most common characters to be escaped are . & ( ) [ ] { } ^ $ : * + - ?.
For more information about regular expressions, you may want to check this tutorial.
Direct mode
- Starting address: This field indicates the URL where to start looking for the value. It should be a well-formed HTTP or HTTPS url.
- Container element: This field selects an HTML containing the numeric text value to extract. It is expressed as a XPath expression.
- Extract pattern: This field allows to extract the numeric value when it is surrounded by other text. It is expressed as a regular expression. This regular expression is checked against the element text matching the XPath expression defined in the Container element field. If the regular expression does not contain a capturing group, the text matching the expression is extracted. For instance, expression \d+ applied to price is 12$ will extract 12. If the expression contains a capturing group, this group will represents the captured value. For instance, expression is (\d+)\$ applied to price of 2 bottles of wine is 12$ will extract 12.
- Next page link: This field selects an HTML <A> tag linking to a page where to continue the search. The selection is performed through an XPath expression. This process is iterative and the loading of new page is repeated until the item is found or the count of scanned pages exceed the value defined in the Maximum depth field. The field can be left empty, no link will be followed. If set, the XPath expression must describe an HTML <A> tag.
- Maximum depth: This value control how many links will be loaded following the Next page link selection. If too high, the value can be overriden by a maximum value set into user's preferences.
Rank mode
- Starting address: This field indicates the URL where to start looking for the value. It should be a well-formed HTTP or HTTPS url.
- Counted elements: This field indicates what elements in the page should be counted. The selection of elements is performed using an XPath expression. The particular element where to stop the count and consider the ranking is specified in the Searched item field.
- Searched item: This field specifies the item you consider ranking of. It is expressed as a regular expression. This regular expression is checked against the elements matching the XPath expression defined in the Counted elements field.
- Next page link: This field selects an HTML <A> tag linking to a page where to continue the search. The selection is performed through an XPath expression. This process is iterative and the loading of new page is repeated until the item is found or the count of scanned pages exceed the value defined in the Maximum depth field. The field can be left empty, no link will be followed. If set, the XPath expression must describe an HTML <A> tag.
- Maximum depth: This value control how many links will be loaded following the Next page link selection. If too high, the value can be overriden by a maximum value set into user's preferences.
Popularity mode
- Starting address: This field indicates the URL where to start looking for the value. It should be a well-formed HTTP or HTTPS url.
- Counted elements: This field indicates what elements in the page should be counted. The selection of elements is performed using an XPath expression. In order for the element to be counted, the element text must match the regular expression defined as Searched item field.
- Searched item: This field complements to Counted elements specifying the pattern the text must match in order for the element to be counted.
- Next page link: This field selects an HTML <A> tag linking to a page where to continue the search. The selection is performed through an XPath expression. This process is iterative and the loading of new page is repeated until the item is found or the count of scanned pages exceed the value defined in the Maximum depth field. The field can be left empty, no link will be followed. If set, the XPath expression must describe an HTML <A> tag.
- Maximum depth: This value control how many links will be loaded following the Next page link selection. If too high, the value can be overriden by a maximum value set into user's preferences.
Variables
You can allow some level of customization at channel level using variables. Variables are defined in the chartlet and are instantiated in the definition of each channel. Click on the Create button of the Chartlet variables section.

You can now define the name and the description of the variable in this window. The encoding field specifies a transformation that might be applied to the variable value before it is used.
Variables can be used in any field of the collect informations (except the Maximum depth field) using placeholders: To insert a variable within a collect configuration field, the syntax to be used is: ${variable-name}. For instance, http://www.google.com/search?q=${query}&num=100