Welcome to the MetGem 1.3 Manual!

Welcome to MetGem’s documentation page.

user_manual_image

User Manual

Discover MetGem features through an online manual.

tutorial_image

Tutorials and Howto’s

Learn through tutorials to see MetGem in action.

getting_started_image

Getting Started

New to MetGem and don’t know where to start?

network_image

General Concepts

Learn about general concepts that are not specific to MetGem

faq_image

MetGem’s FAQ

Find answers to the most common questions about MetGem.

index_image

Index

An index of the manual for searching terms by browsing.

User Manual

Discover MetGem’s features through an online manual.

Getting Started

Welcome to the MetGem Manual! In this section, we’ll try to get you up to speed.

Installation

Windows

Windows users can download MetGem’s installer from the website.

Note

MetGem requires Windows 7 or newer.

OS X

You can download the latest binary from our website.

Note

The binaries work only with Mac OSX version 10.12 and newer.

Linux

Linux binaries are available from the website as an AppImage. It should be as simple as click and run in any distribution.

Starting MetGem

When you start MetGem for the first time there will be no window open by default. You will be greeted by a welcome screen, which will have option to create a new project from MS/MS data or open an existing project.

The first step will be to import data from the File ‣ Import spectra list menu or by clicking on Import Data under start section of the welcome screen. This will open the import data file dialog box. If you want to open an existing project, either use File ‣ Open Project… or drag the image from your computer into MetGem’s window.

Welcome screen
Saving and opening files

Now, once you have figured out import data, you may want to save it. The save option is in the same place as it is in all other computer programs: the top-menu of File, and then Save. Select the folder you want to have your drawing, and select the file format you want to use (.mnz is MetGem’s default format, and will save everything). And then hit Save.

Check out Global Overview for further basic information, General Concepts for an introduction on concepts MetGem is built on, or just go out and explore MetGem!

Global Overview

Global overview of Main Window

MetGem’s interface is divided in four main parts:

Three dialogs are available to import data:

Toolbars

Each toolbar can be hidden/shown using the View ‣ Toolbars menu. It can also be moved to a different position in the main window or even detached from the latter by simply dragging it to the new position with it’s left handle.

Toolbars menu

File Toolbar

File Toolbar
Usage:
  • Create new project, save/open projects to/from file

  • Compute network, load metadata table or load group-mappings

  • Add Network View

  • Show current parameters or change application settings

View Toolbar

View Toolbar
Usage:
  • Adjust zoom or switch application to fullscreen

See Navigation

Network Toolbar

Network Toolbar
Usage:

Export Toolbar

Export Toolbar
Usage:
  • Export networks to Cytoscape or as image,

  • Export metadata or databases results to text files

Databases Toolbar

Databases Toolbar
Usage:
  • Download, create and explore Databases

Search Toolbar

Search Toolbar
Usage:
  • Search in nodes or edges tables

Network views

Example of a View

Data can be visualised using different views. MetGem offers two types of visualisations:

  • Network: A classical Molecular Network view like what can be obtained by the GNPS platform. In this view, each node represent an MS/MS spectrum and each edge represent a the distance between two nodes (obtained via a modified cosine-score calculation). Distance between clusters is arbitrary and has no special meaning.

  • 2-D Projections: A view obtained using a dimension reduction algorithm. This is a 2-D projection of the multidimensional space, so no edge is shown but distance between clusters is informative. To simplify projection, isolated nodes are excluded from the processing and are displayed below the projection. They are arbitrarily distributed and their positions have no special meaning.
    MetGem can use several algorithm to create a visualisation:
    • t-SNE: t-SNE (t-distributed Stochastic Neighboorhodd Embedding) algorithm tends to preserve local distances and distort global distances. This means that, if two clusters are close to each other in the original space, they have statiscally more chance than distant clusters to be close in the t-SNE projection.

    • U-MAP: UMAP (Uniform Manifold Approximation and Projection) is a quite new algorithm (2018) which is very similar to t-SNE but claims to preserve both local and most of the global structure in the data.

    • MDS: MDS (MultiDimensional Scaling) is the algorithm on which t-SNE and UMAP are both based. MDS doest not try to preserve local distances over global distances.

    • Isomap: Isomap (Isometric mapping) is an extension of the MDS algorithm based on the spectral theory which tries to preserve the geodesic distances in the lower dimension.

    • PHATE: PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding) is a tool for visualizing high dimensional data. PHATE uses a novel conceptual framework for learning and visualizing the manifold to preserve both local and global distances.

Changed in version 1.3: To make clear that isolated nodes positions are not meaningfull in projections, an horizontal line is drawn between projected nodes and isolated nodes.

Toolbar

Network View Toolbar
  • settings It is possible to change parameters for each visualisation. The visualisation will automatically be re-computed and updated to match the new paramaters.

  • scale If nodes are too close to each other, you can change the scale of the visualisation using the scale that can be found in the dropdown menu Scale Slider

  • lock This option is only available for the classical Molecular Network view. By default, nodes can’t be moved to prevent changing their positions by accident. This function let you unlock the view.

Adding Views

You can add view during the import data process or later by using the add-view Add View menu in the File Toolbar.

Add View Menu

Interaction

Selection

Selection can be done with mouseleft left click on a node/edge or by selecting a region with mouseright right mouse button. Selected nodes turn yellow while selected edges are highlighted in red. Multiple selections can also be made by holding down the Ctrl key while left-clicking the selection.

Another way to select nodes is the select-neighbors Select neighbors button in the Network Toolbar.

Selecting nodes or edges will automatically filter metadata tables to show only metadata from selected nodes/edges (See Metadata Tables).

By default, selection is linked between view, i.e. when a node is selected in a view, the corresponding node is automatically selected in all other views. This is usefull to see correspondances between views. This behavior can be deactivated using the link Link selection between views button from the Network Toolbar.

View MS/MS spectrum

When a node is selected, the MS/MS spectrum it represents can be loaded in the Spectrum View (See Spectrum View).

Nodes visibility

Selected nodes and edges can be temporarily hidden using the eye-closed Hide selected nodes and edges button from the Network Toolbar. Bring them back using the eye Show all button and edges button.

Sometimes you may want to hide isolated nodes because they are not really informative and they use a lot of space in the screen. This is a job for Clyde, our cute little ghost standing on the hide-isolated-nodes Show/hide isolated nodes button!

Mappings

Nodes metadata can be used to modify how the nodes will look like (See Mapping section). It is also possible to bypass these mappings:

  • color Set the color of the selected node(s). You can use the current color (visible on the top-left corner of the button) or choose another color using dropdown window.

  • node-size Adjust size for the selected nodes(s). Select the desired size using the dropdown menu or type it in the text box. Default node size is 30.

If you added pie charts to nodes, you might want to temporarily disable them. The node-pie Hide Pie Charts button will be of great help in this task.

Keyboard shortcuts

All these shortcuts apply to the active view.

Shortcut

Description

M

Show/hide the Minimap

S

Show the spectrum associated to the selected node in the Spectrum View

C

Compare the spectrum associated to the selected node to another one in the Spectrum View

Ctrl + C

Copy as image the visible part of the active view to the clipboard

Ctrl + Shift + C

Copy as image the full active view to the clipboard

Parameters

Cosine-score computations

Molecular Networking

t-SNE

Metadata Tables

Nodes and edges tables will contains metadata. When nodes or edges are selected in a Network View, only those nodes/edges will appear in the corresponding tables. Filtering can also be performed using the search toolbar.

These tables can be hidden/shown using the View ‣ Toolbars menu. They can also be moved to a different position in the main window or even detached from the latter by simply dragging it to the new position or double-clicking on it’s title bar.

Data menu

Note

Columns can be selected via the mouseright right mouse button while the mouseleft left mouse button is used to change ordering.

Upper Left Corner of Metadata Table

Default ordering can be reset by mouseright right clicking in the empty region on the upper left of the table (left side of the first column and above first line).

Nodes

The nodes table list all nodes and their associated metadata. There is a line per node and a column per metadata type.

The first column of this table is always m/z parent and contains m/z ratios of ions loaded from the import data dialog.

The second one is reserved to Database search results. This column will contain a list of standards found in the databases (See Databases Query). It is hidden by default and will be visible only if there is at least one results to show.

Nodes Toolbar

Mapping Section of Nodes Table Toolbar sep Nodes Section of Nodes Table Toolbar sep Tools Section of Nodes Table Toolbar

The toolbar located on top of the nodes table is divided in three sections.

Mapping section

Mapping Section of Nodes Table Toolbar

The first section is dedicated to change how nodes are representated in views. Just select one or more metadata column(s), with a mouseright right click, then use one these buttons. The dropdown menu include a menu item to undo these changes.

  • node-label Use data from the selected column as labels for nodes in the views.

  • node-pie Represents the data in the selected columns in the form of pie charts on the nodes.

  • node-size Adjusts each node size based on the data from the selected column.

  • node-color Defines color of the nodes from the selected column data.

Nodes section

Nodes Section of Nodes Table Toolbar

In this second section, you can interact with nodes and their associated MS/MS spectra:

  • highlight-yellow Highlight in the views the nodes selected in the table. You can then use the Zoom to the selected region function (see View Toolbar) to easily locate these nodes.

  • spectrum Views the MS/MS spectrum associated to the selected node (shortcut S). You can also compare this spectrum to the one associated to a second node using the Compare spectrum function accessible from the dropdown menu (shortcut C). See Spectrum View.

  • query You can try to find similar spectra in databases by using this function. See Databases.

Note

Functions from this section are also accessible from a context menu that will pop up if you mouseright right click in a cell of the nodes table.

Tools section

Tools Section of Nodes Table Toolbar

The last section defines tools that will end up adding columns to the table:

  • formula This tool allows you to create new data columns by combining existing columns using mathematical functions. See Formulae.

  • cluster This one uses a clustering algorithm (HDBScan) to find clusters in the visualisation and colorize nodes according to. See Cluster.

  • remove Not really a tool, but you can use it to remove unwanted columns from the table.

Edges

The edges table list all edges and the nodes they link. This table has meaning only for the classical network view as the other views don’t use edges.

There is a line per edge and the following columns are available:

  • Source: The first node that the edge link. Source and target are arbitrarily defined and this has no special meaning since the graph is not directed.

  • Target: The second node that the edge link. Same comment than the Source column.

  • Delta MZ: Difference of m/z ratio between the parent ions associated to the two nodes. Sign is irrelevant.

  • Cosine: The similarity score calculated to compare the spectra associated to the two nodes. Value lies between 0 and 1, the higher it is, the closer the spectra are to each other.

  • Possible interpretation: Possible interpretation of the Delta MZ based on the exact mass.

Edges Toolbar

The toolbar on top the edges table offers a few functions.

Edge Table Toolbar
  • highlight-yellow Highlight in the views the edges selected in the table.

  • highlight-red Highlight in the views the nodes connected by the edges selected in the table.

Note

These functions are also accessible from a context menu that will pop up if you mouseright right click in a cell of the edges table.

Spectrum View

The spectrum view is used to visualize nodes’ spectra.

Spectrum View Window

The left pane shows the loaded spectra. You can visualize up to two spectra for comparison. The top spectrum (in red) is the first one loaded while the second one appears upside-down in the bottom part (in blue).

On the top-right corner of the spectrum, you can see the computed similarity score between the two spectra. Value lies between 0 and 1, the higher it is, the closer the spectra are to each other.

On the lowest part of the window, the legend gives information about which spectra is shown (node index and m/z of parent ion).

Matching Peaks List

When two spectra are loaded, the right-hand pane contains information about which fragments or neutral losses are common to both spectra:

  • The top table list matching fragments,

  • The lower table is for matching neutral losses.

Each table has three columns:

  • Top Spectrum: m/z of fragment or mass of neutral loss in the first spectrum

  • Bottom Spectrum: m/z of fragment or mass of neutral loss in the second spectrum matching the one of the first spectrum

  • Partial score: Contribution of these fragments/losses to the overall matching score. These partial scores add up to the total score.

When a line in this table is selected, the corresponding peaks in the spectra are highlighted. Click anywhere in the table outside any cell to deselect all lines.

Toolbar

Spectrum View Toolbar

Spectrum View come with a navigation toolbar on top of the window, which can be used to navigate through the spectrum. Here is a description of each of the buttons:

  • home back forward Home, Forward and Back buttons are akin to a web browser’s home, forward and back controls. Forward and Back are used to navigate back and forth between previously defined views. They have no meaning unless you have already navigated somewhere else using the pan and zoom buttons. This is analogous to trying to click Back on your web browser before visiting a new page or Forward before you have gone back to a page – nothing happens. Home always takes you to the first, default view of your data. Again, all of these buttons should feel very familiar to any user of a web browser.

  • move The Pan/Zoom button has two modes: pan and zoom. Click the toolbar button to activate panning and zooming, then put your mouse somewhere over an axes. Press the mouseleft left mouse button and hold it to pan the figure, dragging it to a new position. When you release it, the data under the point where you pressed will be moved to the point where you released. Press the mouseright right mouse button to zoom, dragging it to a new position. The x axis will be zoomed in proportionately to the rightward movement and zoomed out proportionately to the leftward movement. The point under your mouse when you begin the zoom remains stationary, allowing you to zoom in or out around that point as much as you wish.

  • zoom Click the Zoom-to-rectangle button to activate this mode. Put your mouse somewhere over an axes and press the mouseleft left mouse button. Drag the mouse while holding the button to a new location and release. The axes view limits will be zoomed to the rectangle you have defined. There is also an experimental ‘zoom out to rectangle’ in this mode with the mouseright right button, which will place your entire axes in the region defined by the zoom out rectangle. You can also zoom in and out using the mousescroll mouse wheel.

  • save Click the Save button to launch a file save dialog. You can save as images with the following extensions: jpg, png, ps, eps, svg, pdf, pgf, tif, raw, and rgba. You can also save spectra as text in the following formats: mgf and msp.

  • reset The Reset button will simply unload any previously loaded spectrum.

Load spectra

To load a spectrum, simply select a node in view and use the View ‣ Spectrum menu or the S shortcut. You can also compare two spectra: select a different node and use the View ‣ Compare Spectrum menu or the C shortcut. Second spectrum will appear upside-down. Spectra can also be loaded from the nodes metadata table.

Shortcuts

Shortcut

Description

H, R, Home

Home

Left, C, Backspace

Back

Right, V

Forward

P

Pan/Zoom

Shift

Hold to temporarily activate Pan/Zoom

O

Zoom To Rect

Ctrl

Hold to temporarily activate Zoom To Rect

Ctrl + S

Save

g when mouse is over an axis

Toogle major grids

G when mouse is over an axis

Toogle minor grids

Import data

Data

Networks can be created from a data file (in mgf or msp format) using the Import Data Import Data icon. This will open the following dialog:

Process File Dialog

This dialog lets you open a data file and, optionally, a metadata table (csv file) using the Browse buttons. Separator for the csv file should be auto-detected but you can change this parameter directly in this dialog and more parameters are available via the Options button. See Metadata}.

Parameters used by MetGem for the cosine computations step can be tuned in the Cosine Score Computing section. See Cosine-score computations.

In the Add Views section, you can optionally add visualisations like Molecular Network or 2-D projections. See Add visualisations.

When you have loaded an data file and you are satisfied with the parameters, you can click OK to start the process.

Add visualisations

MetGem offers two types of views that you can add to your project. See Network views.

To add a view, use the dropdown menu next to the add-view button. Choose the desired visualisation and a dialog will open to let you set a few parameters. See Parameters for for more informations about the parameters.

Example of a add visualisation dialog

Metadata

You can associate metadata to the spectra loaded during Data Import data step. You can load these metadata from a csv or from a spreadsheet file (like LibreOffice Calc or Microsoft Excel) or want to load new metadata, you can do so using the Import Metadata tool button from the File Toolbar. The following dialog will pop-up:

Import Metadata Dialog

Metadata file (csv) can be selected using the Browse button. Separator for csv file should be auto-detected but can this be changed in the Options section. More parameters like whether the file contains headers or not can be also be tuned in this section.

You can see the first 100 lines that will be imported in the Preview section. You can select which column to import by clicking on the corresponding headers or via the upper toolbar. If no column is selected, all columns will be imported.

The Refresh button can be used to reload file from disk using the parameters defined in the Options section.

Group Mappings

Group mappings file can be used to group columns and sum values they contains. You can load such a file via the Import Mapping tool button.

Group mapping files are simple text files that should follow the following scheme:

GROUP_group1=filename1.mzXML
GROUP_group2=filename2.mzXML;filename3.mzXML

The example below can be translated as

Create a group named group1 containing columns filename1.mzXML and a group group1 containing columns filename2.mzXML and filename3.mzXML

If a column does not exists, it is simply ignored. Groups can be empty. Group columns are identified with the Import Mapping icon.

Databases

Download

User databases

Explore

Query

Tools

MetGem includes a few tools to manipulate data.

Cluster

Formulae

_images/add-columns-by-formulae.png

This tools is designed to allow you to combine multiple columns using simple or complex formulae. See Usage for pratical informations and Syntax to get a description of the formulae’s syntax.

Usage

The window is divided in two parts:

  • Available columns: on the left lies a table of the metadata columns used to create aliases for metadata columns titles,

  • Mappings: on the right, you can create new columns by combining existing ones.

Creating a new formula will then need two steps:
  1. To be used in a formula, a column need an alias (that need to be a valid Python identifier). You just need to double-click inside a cell of the Alias column to define an alias for the corresponding column. Validate by hitting Enter.

  2. Last step is to add a new column by clicking on the add, set it’s name (a column with this name will be overwritten if it already exists) and define a formula thats includes constants and/or alias set in step before.

Toolbar

The toolbar located on top right of the dialog includes a few buttons:

  • add Add new formula: Add a new empty line in the Mappings table. It’s up to you to fill Name and Formula cells,

  • remove Remove selected formulae: Remove the selected rows on the Mappings table.

  • functions Add Function: A drop-down sectional list of available functions. A function is used by adding a comma-separated list of arguments in parentheses after it’s name, e.g. mean(a,b).

  • constants Add Constant: A list of available constants. A constant is just a replacement for

Syntax

The syntax used to describe formulae is a subset of Python programming language. As MetGem use the Pandas library internally, the following operations are supported:

  • Arithmetic operations except for the left shift (<<) and right shift (>>) operators, e.g., x + 2 * pi / y ** 4 % 42 - pi

  • Comparison operations, including chained comparisons, e.g., 2 < x < y

  • Boolean operations, e.g., x < y and x < y or not column1

  • List and tuple literals, e.g., [1, 2] or (1, 2)

  • Math functions: sum, mean, median, prod, std, var, quantile, min, max, sin, cos, exp, log, expm1, log1p, sqrt, sinh, cosh, tanh, arcsin, arccos, arctan, arccosh, arcsinh, arctanh, abs, arctan2 and log10

  • Constants: pi, e

This Python syntax is not allowed:

  • Expressions:
    • Function calls other than math functions.

    • is/is not operations

    • if expressions

    • lambda expressions

    • list/set/dict comprehensions

    • Literal dict and set expressions

    • yield expressions

    • Generator expressions

    • Boolean expressions consisting of only scalar values

    • Attribute access, e.g., df.a

    • Subscript expressions, e.g., df[0]

  • Statements
    • Neither simple nor compound statements are allowed. This includes things like for, while, and if.

Note

A Python identifier is a name used to identify a variable, function, class, module or other object. An identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero or more letters, underscores and digits (0 to 9).

Python does not allow punctuation characters such as @, $, and % within identifiers. Python is a case sensitive programming language. Thus, Manpower and manpower are two different identifiers in Python.

Settings

Theme

Edges

General Concepts

Tutorials and Howto’s

MetGem’s FAQ