Enterprise Recon 2.12.1
Distributed Scan
This section covers the following topics:
- How a Distributed Scan Works
- Distributed Scan Requirements
- Start a Distributed Scan
- Monitor a Distributed Scan Schedule
You can use ER2 to perform a distributed scan on a Target or Target location using a group of Proxy Agents. Distributed scans allow you to:
- Improve scanning time by having multiple scanning processes executed in parallel.
- Optimize resources by distributing the scanning load across multiple Proxy Agent hosts which might otherwise have been unutilized.
Distributed scans are particularly useful for scanning Targets that have a vast number of locations, for example:
- An Exchange Server with thousands of mailboxes.
- A Microsoft SQL Server with hundreds of databases, with thousands of tables per database.
For more information, see Distributed Scan Requirements below.
How a Distributed Scan Works
For a more detailed explanation on distributed scans, see Scanning - How A Distributed Scan Works.
Distributed Scan Requirements
Proxy Agent Requirements
To perform a distributed scan on a Target or group of Targets, you need to Create an Agent Group to be assigned to the Target or Target location. Ensure that all Proxy Agents in the Agent Group:
- Have been upgraded to version 2.1 and above.
- 
    Support scanning of the Target platform. If any Proxy Agent within the Agent Group does not support scanning of the Target, all sub-scans assigned to the Proxy Agent will not be executed, subsequently causing the scan schedule to fail. To check which Agents are supported for a Target, see the respective pages under Target Type.To run a distributed scan on a MySQL database, ensure that the Agent Group assigned to the scan only contains Windows Proxy Agents or Linux Proxy Agents.
 If the Agent Group assigned to scan the MySQL database includes a Solaris Proxy Agent, the scan schedule will be marked as "Failed" due to incomplete sub-scans.
Supported Targets
You can run a distributed scan on the following supported Target types:
| Target Type | Description | 
|---|---|
| Windows Share | Scans are distributed across the folders and files under the Path of the network storage location as specified in the scan schedule. If the network storage Path in the scan schedule is specified as MyFolder, the scan will be distributed across all files and folders within the MyFolder directory. If the number of files under the Path exceeds a certain limit,
   
 | 
| Remote Access via SSH | Scans are distributed across the folders and files under the Path of the network storage location as specified in the scan schedule. If the network storage Path in the scan schedule is specified as MyFolder, the scan will be distributed across all files and folders within the MyFolder directory. If the number of files under the Path exceeds a certain limit,
   
 | 
| IBM DB2 | Scans are distributed across the tables in the database. | 
| InterSystems Caché | Scans are distributed across the tables in the database. | 
| MongoDB | Scans are distributed across the collections in the MongoDB Server. | 
| MariaDB | Scans are distributed across the tables in the database. | 
| Microsoft SQL Server | Scans are distributed across the tables in the database. | 
| MySQL | Scans are distributed across the tables in the database. | 
| Oracle Database | Scans are distributed across the tables in the database. | 
| PostgreSQL | Scans are distributed across the tables in the database. | 
| SAP HANA | Scans are distributed across the tables in the database. | 
| Sybase / SAP ASE | Scans are distributed across the tables in the database. | 
| SharePoint Server | Scans are distributed across the sites in the SharePoint Server. | 
| Confluence On-Premises | Scans are distributed across the spaces, blog post folder, and/or top-level pages that are one-level below the selected location(s). Example 1When the entire Confluence domain is selected, the scans will be distributed across each space (e.g. Space Engineering and Space Product) in the domain. Confluence [host name: my-confluence-server]
     Confluence on target MY-CONFLUENCE-SERVER
         Space Engineering
             Blog Post Folder
                 Blog Post January
         Space Product
             Page Feature
                 Page Feature A
                 Page Feature BExample 2The scans for Space Engineering will be distributed across the blog post folder (Blog Post Folder) and top-level page (Page Development). Confluence [host name: my-confluence-server]
     Confluence on target MY-CONFLUENCE-SERVER
         Space Engineering
             Blog Post Folder
                 Blog Post January
                 Blog Post February
             Page Development
                 Page Bug Fixes
                 Page Enhancements
         Space Product
             Page Feature
                 Page Feature A
                 Page Feature B
             Page Release
                 Page Release Q1
                 Page Release Q2 | 
| Amazon S3 Buckets | Scans are distributed across the Amazon S3 Buckets in the Amazon account. | 
| Azure Storage | Scans are distributed across the Blobs, Tables or Queues in the Azure Storage account. | 
| Box Inc | Scans are distributed across the locations in the Box Inc domain that
        are selected for the scan schedule.
        For example, in the scenario below, the scans will be distributed across
        four locations. Box [domain: example.app.box.com]
     Group Administration
     Group Engineering
         User user1@example.com
         User user2@example.com
     Group Finance
         User user3@example.com
         User user4@example.com
         User user5@example.com
     Group Human Resource
     Group Sales | 
| Exchange Domain | Scans are distributed across the mailboxes in the Exchange domain. | 
| Exchange Online | Scans are distributed across the mailboxes in the Microsoft 365 domain. | 
| Google Workspace | Scans are distributed across the users in the Google Workspace domain. | 
| Google Cloud Storage | Scans are distributed across the buckets in the Google Cloud Storage project. | 
| Microsoft OneNote | Scans are distributed across the user or group name notebooks in the Microsoft 365 domain. | 
| Microsoft Teams | Scans are distributed across the (i) channels in a team, or (ii) users in a group within the Microsoft 365 domain. | 
| Rackspace Cloud | Scans are distributed across the cloud server regions in the Rackspace account. | 
| Salesforce | Scans are distributed across the objects in the Salesforce domain. | 
| SharePoint Online | Scans are distributed across the sites in the SharePoint Online domain. | 
Start a Distributed Scan
Running a distributed scan is the same as starting any other scan.
- Log in to the ER2 Web Console.
- Navigate to the Select Locations page by clicking on:
    - Scans > New Scan, or
- the New Scan button in the Dashboard, Targets, or Scans > Schedule Manager page.
 
- On the Select Locations page, click + Add Unlisted Target. Follow the on-screen instructions to add a new Target.
- When prompted to select an Agent to act as proxy host, click on the Select proxy agent menu and select a suitable Agent Group.
    If any Proxy Agent within the Agent Group does not support scanning of the Target, all sub-scans assigned to the Proxy Agent will not be executed, subsequently causing the scan schedule to fail. To check which Agents are supported for a Target, see the respective pages under Target Type.
- Click Test, and then Commit.
- On the Select Data Types page, select the Data Type Profiles to be included in your scan and click Next. See Data Type Profiles.
- Set a scan schedule in the Set Schedule section. Click Next.
- Review your scan configuration. Once done, click Start Scan.
Monitor a Distributed Scan Schedule
Distributed scans show up in the Targets page and Scans > Schedule Manager page in the Web Console just like any other scan. See View and Manage Scans for more information.