Prometheus Integration

This feature is only available in Incredibuild's Enterprise Plan.

Prometheus is a free, open source tool to help collect and visualize data. It integrates with many other tools that provide additional visualization tools as well. Incredibuild integrates with Prometheus by allowing you to send Incredibuild data to Prometheus to collect and view Incredibuild data.

Once enabled, Incredibuild automatically exports data that Prometheus can pull every few seconds.

Note: If you disable the integration or changes the port, all metrics will be reset.

Configuring the Integration

  1. If you have not already done so, install Prometheus on any machine in your Incredibuild environment.

  2. In the Coordinator Manager > Settings > General check the Enable Prometheus Integration checkbox. If you do not see this option, make sure you are using an Enterprise level Incredibuild license.

  3. Enter the port that will be used to manage communication between Incredibuild and Prometheus. This port must be open on both sides.

  4. On the machine that contains Prometheus, in the main Prometheus directory, open the prometheus.yml file.

  5. Below the section that contains job_name="prometheus", add the following code. Replace <IB Coordinator Machine IP or name> and <prometheus port> with the hostname of the machine that contains your Incredibuild Coordinator and the port you defined above. If this is the same machine as Prometheus, you can use "localhost" instead.

    If you have a Backup Coordinator, you can also specify that machine here to ensure data continuity in the event that the Primary Coordinator is down.

    Copy
         - job_name: "IB_coordinator"
           static_configs:
             - targets: ["<Incredibuild Coordinator Machine IP or name>:<prometheus port>"]
                 

    For example: 

    Copy
    # my global config
    global:
      scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: "prometheus"

        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
     
        static_configs:
          - targets: ["localhost:9090"]
      - job_name: "incredibuild"
     

        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.

        static_configs:
          - targets: ["localhost:9096"]  
          ]
  6. If it isn't started, start the Prometheus service. For details, see the Prometheus Documentation.

  7. To verify that the integration has been configured, open Prometheus and verify that the Incredibuild metrics exist in the search results.

The integration is configured, but no data is sent until a build is run. Note that the data for the first few builds may look a bit unusual in Prometheus and Grafana due to the way their queries are calculated.

Viewing Data in Prometheus Graphs

You can use queries to create graphs in Prometheus based on the Incredibuild metrics. These queries assume that the job label defined in the prometheus.yml file is called "IB_coordinator".

Note that the data for the first few builds may look a bit unusual in Prometheus and Grafana due to the way their queries are calculated.

Number of builds per hour per Build Group

sum by (build_group) (rate(incredibuild_builds_total{job="IB_coordinator"} [1h])) * 60 * 60

Number of builds per day broken down by build status (ended successfully or not)

sum by (ended_successfully) (rate(incredibuild_builds_total{job="IB_coordinator"} [1d])) *24 * 60 * 60

Number of unsuccessful builds per day broken down by Build Group

sum by (build_group) (rate(incredibuild_builds_total{job="IB_coordinator", ended_successfully="failure" }[1d])) * 60 * 60 * 24

Number of successful builds per day broken down by Build Group

sum by (build_group) (rate(incredibuild_builds_total{job="IB_coordinator", ended_successfully="success" }[1d])) * 60 * 60 * 24

Number of builds per hour

sum(rate(incredibuild_builds_total{job="IB_coordinator"}[1h]) * 60 * 60)

Current busy Helper Cores per Build Group

sum by (build_group)(incredibuild_busy_helper_cores{job="IB_coordinator"})

Current Helper Cores in the Pool per Build Group

sum by (build_group)(incredibuild_helper_cores_in_pool{job="IB_coordinator"})

Current Licensed Helper Cores

sum by (coord_name)(incredibuild_core_licenses{job="IB_coordinator"})

Build duration histogram broken down by Build Group

histogram_quantile(0.5, sum by(le, build_group) (rate(incredibuild_builds_seconds_histogram_bucket{job="IB_coordinator"}[1d])))

Average build duration per day

rate (incredibuild_builds_seconds_histogram_sum{job="IB_coordinator"}[1d]) / rate(incredibuild_builds_seconds_histogram_count{job="IB_coordinator"}[1d])

Viewing Data in Grafana Dashboards

Prometheus can integrate with Grafana to display the data in dashboards. To integrate Incredibuild and Prometheus with Grafana, see this article. When you create a Data Source, make sure the name is Prometheus (that is the default).

You can download our sample JSON dashboards and import them directly into Grafana to view data about the number of concurrent builds, helper cores, build duration, and number of builds per day.

To import JSON dashboards into grafana, see the Grafana documentation.

Data Included in the Integration

Data Metric Type Metric Name Labels (details in table below) Notes
Total number of completed builds

Counter

incredibuild_builds_total coord_name, build_group, initiator_type, ended_successfully

This includes builds that failed as well as those that completed successfully.

Total number of builds broken down by initiator

Counter

incredibuild_builds_per_initiator_total initiator_name, ended_successfully

 

Total run time (seconds) of all successful builds

Counter

incredibuild_successful_builds_seconds_total coord_name, build_group, initiator_type, distributed_to_helpers

 

Total number of executed tasks

Counter

incredibuild_tasks_total coord_name, build_group, initiator_type, distributed_to_helpers

 

Total number of tasks distributed to a helper

Counter

incredibuild_distributed_tasks_total coord_name, build_group, initiator_type

 

The number of Initiators currently initiating a build

Gauge

incredibuild_active_initiators coord_name, build_group, initiator_type

 

The number of builds currently running

Gauge

incredibuild_running_builds

coord_name, build_group, initiator_type

 

The number of helper cores that are currently busy

Gauge

incredibuild_busy_helper_cores

coord_name, build_group, helper_type

 

The number of helper core licenses in your account

Gauge

incredibuild_core_licenses

coord_name

Includes both fixed and floating

The number of helper cores that currently have been assigned a fixed or floating license

Gauge

incredibuild_helper_cores_in_pool

coord_name, build_group

 

The number of helper cores currently online

Gauge

incredibuild_helper_cores_online

coord_name, build_group

Only includes helpers that are enabled as helpers, online, and have a license.

The total number of successful builds broken down in a histogram

Histogram

incredibuild_builds_seconds_histogram

coord_name, build, build_group

Broken down into 0-1 seconds, up to 10 seconds, up to 60 seconds, up to 30060-300, 301-600, 601-1200, 1201-1800.

The number of helper agents currently helping in a build

Gauge

incredibuild_agents_helping

coord_name, build_group, helper_type

Does not include helper agents assisting initiators on the same machine.

The current number of disabled helper cores with a license

Gauge

incredibuild_disabled_helpers_with_license

coord_name, build_group

 

The number of agents currently online

Gauge

incredibuild_online_agents

coord_name, build_group

Includes helper and initiator agents

The number of agents currently offline

Gauge

incredibuild_offline_agents

coord_name, build_group

Includes helper and initiator agents

The number of agents without any license

Gauge

incredibuild_unlicensed_agents

coord_name, build_group

Agents without initiator or helper license (called local in UI)

The number of helper agents currently online

Gauge

incredibuild_online_helpers

coord_name, build_group

Only includes helpers that are enabled as helpers, online, and have a license.

The number of initiator agents currently online

Gauge

incredibuild_online_initiators

coord_name, build_group

 

The number of helper cores currently needed across your grid

Gauge

incredibuild_needed_helpers

coord_name, build_group

Only includes helper cores that meet all requirements to act as helpers (includes both helper cores currently helping and those who are ready to be assigned to help)

The number of helpers currently available across your grid

Gauge

incredibuild_available_helpers

coord_name, build_group

 

Label Details

Label Name Description

coord_name

The machine ID of your Coordinator.

build_group The name of the build group.
initiator_type The type of Initiator machine: CI, developer, or none.
ended_successfully "Successful" if the build completed successfully, "failure" if it did not.
distributed_to_helpers True is some tasks were distributed to helper agents, false if they were not.
helper_type The type of helper machine: cloud or on-prem