Blockpaca: Enabling Concurrent Trade Execution for the Alpaca API

Dec 25, 2025
11 min read

Introduction

Hedging, pairs trading, and statistical arbitrage have traditionally been the domain of institutional investors. In recent years, the widespread availability of programmatic trading APIs has made these strategies accessible to retail traders as well. Implementing them typically requires the near-simultaneous execution of multiple trade legs. However, U.S. equities exchanges do not support arbitrary multi-symbol atomic orders, and many trading APIs do not provide a mechanism for submitting multiple orders in a single atomic request.

Like many Python-based broker APIs, Alpaca only allows a single trade to be submitted per API call. This limitation makes it difficult to implement strategies that require multiple trades to be executed concurrently. To address this, we present a system for Alpaca that enables concurrent execution, portfolio monitoring, trade status tracking, and real-time market data streaming.

In the following sections, we provide a visual and technical overview of the system architecture, explain how to import and use the system for retail trading, and present performance results showing near O(1) execution time for multi-leg trade submission, compared with O(N) time for a traditional loop.

Limitations

This system is designed for casual retail traders. While a C++ implementation would achieve lower execution latency, we chose Python to make the code easier to understand and modify. The system currently runs in an Alpaca paper trading portfolio. Switching to live trading is straightforward, but users are encouraged to thoroughly test the system before deploying real capital.

This code was developed with Alpaca free-tier users in mind and enforces a limit of 200 trade requests per minute. While the system does not explicitly restrict the number of tickers you can subscribe to, subscribing to more than 30 tickers may cause position values to stop updating and prevent quote data from being received. Alpaca offers several paid tiers that allow higher trade request limits and support a larger number of concurrent ticker subscriptions.

If your computer has five or fewer CPU cores, you may experience significant performance degradation. The system is designed to run on machines with at least six cores, as its core components alone consume four cores, excluding trade execution.

Architectural Overview

The main components of the trading system are divided into two groups:

Group One (infrastructure and state):

Quote stream
Trade update stream
Portfolio state process

Group Two (execution):

Trading algorithm
Trading workers

Each group is designed with a distinct priority order.

The primary goal of every process in Group One is to avoid introducing latency into the operations of Group Two. Its secondary goal is to perform its own tasks as quickly as possible. In other words, infrastructure processes must never interfere with trade execution, even if that means they process updates slightly more slowly.

The primary goal of each process in Group Two is to execute its task as quickly as possible. Its secondary goal is to avoid introducing latency into other Group Two processes, including other trading workers.

For example, it is acceptable if a trade execution slightly delays the portfolio state process from recording a previous trade. However, it is unacceptable if portfolio state updates introduce overhead that slows down outgoing trade execution. Execution latency takes precedence over state consistency timing.

The system architecture is explicitly designed to enforce these priorities while still maximizing the performance of each individual component.

Parallelism and Hardware Considerations

This system was built using Python 3.12, which includes something called the Global Interpreter Lock, or GIL. The GIL means that, within a single Python program, only one thread can run Python code at a time. Even if your computer has multiple CPU cores, threads in the same program cannot truly run in parallel.

Python provides a threading library to create and manage threads, but because of the GIL, using threads does not allow CPU-heavy work to run faster on multiple cores. Threads are still useful for tasks like waiting on network requests, but they do not provide real parallel execution for computation.

To allow the system to run tasks in parallel, we use Python’s multiprocessing library instead. Multiprocessing works by starting multiple Python processes rather than multiple threads. Each process has its own GIL, which allows the operating system to run them at the same time on different CPU cores.

For this reason, this system uses multiprocessing to achieve true parallel execution across multiple cores.

Another constraint on concurrency is the number of CPU cores available on the machine running the code. At most one process can actively run on each core at a time, so the maximum level of true parallelism is limited by the number of cores. To make efficient use of system resources, the system automatically detects the number of available cores and spawns that many processes. You can manually specify the number of execution workers using a keyword argument, but setting this value higher than the number of available cores will not provide any additional performance benefits.

By default, separate Python processes do not share data, so each process maintains its own copy of state variables. To address this, we use shared memory objects and queues provided by the multiprocessing library, which exist outside any single process and allow data to be safely shared between processes.

Data Streaming and Result Handling

The Alpaca API provides two separate data streams: quote data and trade update data. The quote stream delivers the latest bid and ask prices for one or more tickers, while the trade update stream provides real-time updates on the status of your orders. By default, bid and ask data is sourced from IEX. Future versions may include a built-in option to switch to the SIP feed, which is available only to paid Alpaca subscribers. For now, users who want SIP data must manually change the data stream URL.

To begin, separate Python processes are spawned to listen to the quote stream and the trade update stream simultaneously. The top priority for incoming data is making it available to the trading algorithm as quickly as possible. When a new message arrives, the system immediately updates the shared-memory bid and ask arrays so the algorithm always has access to the most recent data.

Next, messages from both streams are placed into a multiprocessing queue, in the order they arrive. We refer to this queue as the “result queue.”

Portfolio State

The portfolio state is managed by its own dedicated Python process. This process updates the portfolio by consuming messages from the result queue. When new quote data arrives, it updates position values and P&L. When trade status updates arrive, it adds or removes positions as needed. Position values are recorded using the filled prices from trade updates, while ongoing P&L calculations are based on live quote data.

The internal portfolio state is stored in a Polars DataFrame. We chose Polars over pandas due to its performance advantages and efficient vectorized operations.

After processing messages from the result queue, the portfolio process also listens to a second multiprocessing queue called the “command queue.” This queue is primarily used for trading-related commands, but it can also receive requests for the current portfolio state. When such a request is received, the portfolio process converts the current Polars DataFrame into a list of dictionaries and places the result back into the result queue.

The portfolio process also maintains a dictionary to track partially filled orders. With each trade update, it reduces the remaining share count for the corresponding order. The portfolio state reflects only shares that have been filled, which prevents double-counting. In addition, the portfolio tracks gross capital, defined as the initial capital at instantiation, available capital, which is updated after each execution, and both unrealized P&L and realized profit from closed positions.

Submit Order Function

Trade execution is handled through the submit_order function. This function is provided to the trading algorithm as part of the context object and is also used by the system to execute any orders returned by the algorithm if it chooses not to call submit_order directly. If the algorithm returns no orders, it has either chosen not to trade or has already submitted orders within its own script. In that case, submit_order is not called again.

Rate Limit Handling

For free-tier users, Alpaca limits trade submissions to 200 per minute. In a system designed to submit many trades, this limit can be reached quickly. To centrally track remaining trade capacity, the system uses a semaphore initialized with 200 tokens. A semaphore acts like a counter: each trade submission consumes one token, and if no tokens remain, further submissions are paused until tokens are replenished. This rate-limiting logic runs in its own Python process to ensure consistent timing and to prevent delays caused by other system components.

Any task that intends to submit a trade first calls the acquire() function, which decrements the semaphore. If the semaphore reaches zero, additional tasks pause and wait until a token becomes available before proceeding with execution.

Execution Workers

Before continuing, it is helpful to define what we mean by an “execution worker.” In this system, an execution worker is a Python process responsible for submitting trades. If your computer has N CPU cores, the system always starts five core processes at runtime: the main process, the quote stream listener, the trade update listener, the portfolio process, and the rate limit tracker. The remaining N − 5 processes are created as execution workers.

Each execution worker connects to shared memory containing the latest bid and ask prices and runs an asynchronous event loop using the aiohttp library to submit batches of orders assigned to it.

Each order is submitted to Alpaca’s API as an HTTPS POST request. If orders are sent one at a time, a new network connection must be opened for each request, which is inefficient because connection setup is relatively slow. The aiohttp library provides a more efficient approach. While the event loop is running, it opens multiple network connections and keeps them alive for reuse. This allows many execution requests to be sent concurrently without waiting for new connections to be established.

Each execution worker can maintain up to 100 simultaneous connections. In practice, this limit is unlikely to be an issue for free-tier Alpaca users, since the maximum number of tickers that can be monitored at once is 30.

Zero MQ and Order Block Distribution

We explored several approaches for distributing orders across execution processes. These included using a lightweight regression model to estimate fill time and fill probability, as well as vectorized matrix operations to assign orders to workers. We also tested a transformer-based version of the prediction model, but found that it introduced significant latency and slowed down order routing.

Ultimately, we chose to use the ZeroMQ library for distributing and collecting execution tasks. ZeroMQ, or ZMQ, is a high-speed, lightweight messaging library designed for asynchronous communication. In this system, we use its push–pull pattern for work distribution. Orders are pushed into ZMQ and automatically distributed in a round-robin fashion to whichever execution worker is ready. This approach keeps all workers busy without requiring the system to track worker availability or manually assign orders.

At first glance, distributing a large block of orders could require multiple assignment rounds if there are more orders than execution workers. To avoid this and preserve parallelism, we split each block of orders into the same number of subgroups as there are execution workers. This guarantees that ZMQ assigns exactly one subgroup of orders to each execution worker, allowing all workers to execute in parallel with a single distribution step.

Concurrent Execution via ZMQ and aiohttp

The combination of ZMQ-based order distribution and aiohttp connection pooling allows up to M × 100 orders to be submitted in parallel, where M is the number of execution workers created at runtime. Aside from the initial step of splitting the block of orders into M groups, which has O(N) time complexity but very low per-order cost, all orders are submitted concurrently.

Trading Algorithm Development and Articulation

The system is run using the run_trading function. This function takes a callable trading strategy as one of its arguments. The strategy must accept a single argument, called the context object. The context object provides access to data retrieval functions, order submission, and the shared resources used by the system, including shared memory, the result queue, and the command queue. The strategy must read data snapshots and submit orders using the formats defined in the GitHub documentation.

When calling run_trading, you specify both the total runtime and the frequency at which the strategy is executed. If you define any data structures in the main scope of your script, rather than inside the strategy function itself, those structures can be used to store information across runs, such as moving averages or other persistent signals, and reused when making trading decisions.

The system currently supports market and limit orders, with order sides including buy, buy to cover, sell, and short sell. The eos_behavior parameter in run_trading controls what happens when the strategy finishes running. The available options are "liquidate", "hold", and "custom".

Live Visualization

If desired, you can set the live_display flag to True to enable a live terminal view of current positions and order status, built using the rich library. If live_display is left as False, order status updates and Polars portfolio snapshots are printed to the terminal as new messages, which requires scrolling to view older updates.

Enabling live_display uses one additional CPU core, since it runs in its own dedicated Python process to handle live updates.

Testing and Results

To evaluate the system, we defined a baseline case using a simple loop that submits each order one at a time. Since Alpaca does not support batch order submission, this loop-based approach is commonly used by retail traders when placing multiple orders. To demonstrate the performance gains of our system, we measured how long each approach took to submit blocks of orders of increasing size. For example, we tested the time required to submit buy orders for five different tickers, then repeated the test with six tickers, and continued increasing the block size in the same way.

To minimize order book effects on latency, all submitted orders were market buy orders with a quantity of one share, using the same set of tickers for both the loop-based and concurrent systems. For each test, the submission time for every trade in a block was defined as the moment the block was submitted. In the concurrent system, this corresponds to when the trading system received the block, while in the loop-based case it corresponds to when the first loop iteration began. As a result, all trades within a block shared the same submission timestamp.

We extracted timestamps for later stages of the submission process from the Alpaca trade update stream. Our primary focus was the time between submission and when Alpaca acknowledged the order, marked as pending_new. We also recorded the timestamp at which each trade was filled.

Figure 1: Total latency from order submission to broker receipt for blocks of increasing size. Latency is defined as the time between the block submission timestamp and the latest pending_new timestamp within each order block, measured for both the concurrent and loop-based systems.

Other than minor variability introduced by an unstable home network and a slight upward drift due to batching overhead for larger order blocks, the execution time of the concurrent system remains nearly constant. In contrast, the loop-based approach exhibits a clear linear increase in latency as block size grows.

Figure 2: Total latency from order submission to order fill for blocks of increasing size. Latency is defined as the time between the block submission timestamp and the latest filled timestamp within each order block, measured for both the concurrent and loop-based systems.

Compared to submission latency, the fill latency plot shows a more noticeable upward trend for the concurrent system. This suggests the presence of order book–related delays on Alpaca’s side, potentially due to server-side latency or simulated slippage within the paper trading environment. Despite this effect, the concurrent system continues to significantly outperform the loop-based baseline across all tested block sizes.

Figure 3: Contributions to average per-order latency as block size increases for the loop-based execution model. Submission-to-broker receipt latency is computed as the average time between submission and pending_new across all orders in a block. Receipt-to-fill latency is computed as the average time between pending_new and filled timestamps for the same orders.

As block size increases, the time between submission and broker receipt becomes the dominant contributor to overall execution latency. This indicates that sequential order submission increasingly delays downstream execution as blocks grow larger. When the same decomposition is applied to the concurrent execution model (Figure 4), the submission-to-receipt component remains relatively stable, aside from a slight upward drift attributable to batching overhead for larger order blocks.

Figure 4: Calculated in the same manner as Figure 3, except using data collected from the concurrent execution system rather than the loop-based system.

Next Steps

We have several ideas for future improvements to the platform. Planned enhancements include automatic synthetic flattening as an additional eos_behavior option, as well as anomaly detection tools that can alert users when the system or a trading strategy behaves unexpectedly. We are also considering features such as automatic hedging, expanded support for Alpaca paid-tier accounts, and the addition of options trading support.

We welcome feedback and suggestions for future development and encourage users to reach out with ideas or requests.

Usage

Documentation and setup instructions are available on the project’s GitHub page. If you encounter any issues while using the package, feel free to reach out for assistance.