Automatic subgraph extraction (BETA)

This feature is currently in BETA and could undergo major updates.

AWS Bluinsights provides a way to automatically extract subgraphs by only specifying certain criteria. This simplifies the tedious process of manually selecting files and nodes when splitting a large graph.

First, navigate to the dependencies tab of Assets, and make sure the dependencies graph is generated. Then, either in the Table view or the Graph view, select the files and nodes to define the scope of the extraction. Generated subgraphs will never include files and nodes outside that selection.

Pro tip: To quickly select all files and nodes, use the “Ctrl+a” shortcut.

Once the files and nodes are selected, a menu of options should be displayed on the bottom. Click on “Generate subgraphs” to start configuring the extraction criteria.

A multi-step pop-up opens up. Note that all steps have pre-filled default values, which makes them skippable and speeds up the configuration. The number of files and nodes selected are also displayed on the header.

General

image.png

The generated subgraphs will all be saved inside a new group created for the occasion. By default, the group has the current timestamp as its name, but it can be changed, along with the number of extracted subgraphs.

Files and nodes to include

image.png

The exploration of the graph starts from files and nodes with no inbound links (aka entrypoints).
If needed, other files and/or nodes can be added with the familiar filters BQL syntax to be used as entrypoints for the graph exploration. These files and/or nodes will be included, along with all their dependencies, in all generated subgraphs.

Constraints

image.png

This is the most important step of the configuration. Constraints are hard rules that all generated subgraphs will follow. For example, setting the maximum number of files to 20 means that all generated subgraphs will have 20 files or less.

Pro tip: If the constraints are too restrictive, the extraction might fail to find enough valid subgraphs. Likewise, if the constraints are not restrictive enough, the generated subgraphs will all converge to become the biggest subgraph possible in the selection. In short, there is a balance to strike with constraints to have the best results.

Ranking

image.png

Behind the scenes, more subgraphs than the number requested might be generated. The ranking (optional) allows to pick a distribution of file types that will be used as a ranking criteria, among other ones we use internally, to score the subgraphs and propose the most suitable ones.

Note: The difference with the constraints from the previous step is that the constraints are hard rules that will always be respected, whereas ranking criteria are used to score already generated subgraphs and will only affect which ones will be chosen along with their order. As a result, the generated subgraphs should not be expected to follow closely the proposed distribution.

Once the “Generate” button is clicked, the subgraphs extraction starts. The pop-up displays a progress bar and it can safely be closed if needed.

When the extraction is finished, the generated subgraphs and their group can be found on the subgraphs right drawer. They can be browsed and modified like any other manually extracted subgraph.

image.png