Date: 07/16/2025
Vendor: Clockwork Systems
Technology/Topic: AI Networking Reinvented: Accelerate Smarter with Software
=====================================
Welcome to the Technical Exchange Meeting (TEM)!
Boost AI job speed and GPU utilization with blazing fast efficiency. Clockwork’s pure software solution delivers zero-loss network performance at scale on commodity NICs and switches. No proprietary hardware or expensive upgrades needed. Upgrade and future-proof existing networks to match the speed and scale of AI workloads without expensive hardware and proprietary, brittle add-ons for congestion control and load balancing.
With Clockwork’s software-based solution, get unmatched visibility and continuous high-performance networking, supercharging your AI jobs 24/7 — whether on-prem or in the cloud. No bottlenecks. No GPU waste. No rigid, specialized hardware holding you back.
Clockwork’s approach is fundamentally different. Their unique software-based solution ensures reliability and fabric acceleration without relying on custom hardware or in-band network telemetry. Compatible with standard Ethernet switches and NICs, it can scale beyond 100,000 GPU nodes while cutting costs, boosting flexibility, and enhancing resilience.
Clockwork’s innovative solution is built on top of their breakthrough clock sync technology, which enables accurate true one-way delay measurements on each data path at scale, without specialized hardware or network support. Their “data-path aware” software transforms traditional networks into high-performance systems designed for the rigorous demands of today’s AI workloads.
Clockwork Systems is the first to market software defined systems clock synchronization accurate to the tens of nano-seconds and fully distributed. Clockwork uses the perfection of time to monitor, infer, & control network performance latency correlated from distributed applications through the network by accurately measuring one-way delays from sender to receiver. Clockwork is able to reduce or eliminate packet drops in TCP/IP networks making them virtually lossless fabric as well as providing application QoS allocating or limiting available bandwidth to applications on a policy basis. Clockwork can accomplish all from the edge on servers/vm’s/or containers without any configuration of network switches.
Clockwork has now focused its breakthrough technology on GPU clusters built for large language modeling, fine tuning, and inference. Clockwork is deployed within GPU infrastructure as a NCCL (Nvidia Collective Communications Library) plugin. By using nano-second clock sync along with NCCL, Clockwork can provide a real-time observability platform for all east-west cluster inter-communications critical for AI ML collectives such as All reduce or All-to-All functions. These collectives can amount to 35%-70% of the total job completion time for AI modeling. Clockwork provides these 3 critical capabilities for AI workloads:
[1] Resiliency: Clockwork via NCCL can determine a link/ transceiver failure within the GPU cluster and dynamically failover the AI job payload to a healthy NIC without the job crashing having to revert to a Checkpoint. When the NIC is reset Clockwork will dynamically failback the workload to full throughput capacity.
[2] Fleet Monitoring & Workload Performance Management: Clockwork provides the ability to identify any errors, mis-configurations, contention, and bottleneck congestion that limits optimal performance within the Cluster. The granularity includes Qpair observability correlated to the model workload. No other solution is available on the market today that provides this level of GPU fabric visibility at sub-microsecond intervals other than Clockwork.
[3] Performance Optimization: Clockwork’s NCCL plugin allows for the ability to load-balance the AI job payload by identifying the path of least congestion versus congested paths all in real-time. Clockwork can dynamically alter the payload to optimize the path to reduce the time it takes to complete job times.
=====================================
To join the DISA TEM mailing list, please contact: disa.tem@mail.mil
=====================================
Disclaimer:
— TEMs do not serve as a marketing venue or request for proposal actions.
— TEMs shall not be interpreted as a commitment by the Government to issue a solicitation or ultimately award a contract.
— TEMs do not serve as an endorsement of any presented technologies or capabilities
— Presentations will not be considered as proposals nor will any awards be made as a result of a TEM session.
— TEMs are public open forums – no proprietary or sensitive information should be presented during TEM sessions. Only publicly facing content is permissible in DISA TEM sessions.