Cloth Simulation - Parallel Computing Project

We aim to implement a parallel cloth simulation using a Mass-Spring model. The simulation will run efficiently on GPUs using CUDA to accelerate computations. We started from the open-source library cuda_clothes, implemented basic self-collision detection, introduced a grid-based broad-phase collision detection method, and an iterative refinement strategy to enhance robustness.

Application Overview

In this project, we plan to perform a cloth simulation on an irregular 3D surface, which may also include collision simulations that require dynamic task allocation techniques. The simulation involves not only handling the deformation and forces acting on the cloth but also collision detection and response with surrounding objects. During the simulation, the system needs to update forces (such as gravity, elasticity, and air resistance) on every vertex of the cloth in real-time or near real-time, as well as detect and handle collisions between cloth vertices or triangles and external objects.

Because the cloth is typically composed of a large number of mesh vertices and triangular faces, and every vertex may be subject to distinct constraints and collisions when placed on an irregular surface, the computational workload can be substantial. Moreover, collision detection often requires repeated, intensive spatial queries for different regions or objects, with computational cost growing rapidly as the cloth mesh or scene complexity increases. To address this challenge, we plan to split the cloth simulation into multiple smaller tasks that can be processed in parallel, such as partitioning the cloth into sub-regions for force and collision computations, and dynamically merging or splitting these sub-tasks depending on the simulation load. This approach will make full use of multi-core or multi-threaded architectures to increase overall simulation performance.

We have updated the workflow to include synchronization, spatial grid construction, and iterative collision detection refinement:


initialize cloth mesh
while simulation running:
    update vertex positions (synchronized)
    construct/update spatial grid
    repeat until stable:
        detect collisions using spatial grid
        resolve collisions
    synchronize updated mesh state
render simulation results

Possible Parallel

By partitioning the cloth mesh into smaller sub-regions and processing them in parallel, each thread or computational unit only needs to handle a limited set of vertices or triangles, thus reducing the computational cost and memory access overhead for collision detection and other tasks. Furthermore, as the cloth moves or collision demands change, we can dynamically adjust the partitioning and scheduling strategy to prevent any thread from being either underutilized or overloaded, and thereby further optimize overall performance.

The Challenge

One of the main challenges of this cloth simulation project is the inherently irregular and dynamic nature of the problem. Each cloth vertex’s movement and collisions depend not only on its immediate neighbors for forces such as tension and bending but also on the state of the environment, which can vary widely across the surface. This leads to complex dependencies where results in one region of the cloth can impact calculations in another region. Additionally, frequent and unpredictable collision checks introduce irregular memory access patterns: data may need to be fetched from disparate memory locations, complicating caching and reducing spatial and temporal locality. As a result, the communication-to-computation ratio can become high if each sub-region must constantly exchange boundary information or resolve interactions with other sub-regions.

These factors make it difficult to parallelize the workload in a straightforward manner. Some parts of the cloth may experience heavy collisions or intricate deformations (thus requiring extensive computation), while others remain relatively stable—leading to divergent execution. Balancing this load evenly across multiple threads or computational units becomes a non-trivial task. Moreover, dynamically reallocating sub-regions or tasks at runtime to accommodate changing collision patterns introduces additional overhead and synchronization demands.

Constrains

Mapping the workload to a parallel system is further constrained by the need to maintain a globally consistent state. Each time step must correctly account for updates in all sub-regions, and any collisions across boundaries of these sub-regions must be handled without race conditions or data inconsistencies. On hardware with limited shared memory or certain GPU architectures, the irregular access patterns and sudden spikes in computational requirements can cause bottlenecks and underutilized resources. Consequently, one of the core lessons to be learned in this project is how to design dynamic, load-balanced partitioning strategies and synchronization mechanisms that accommodate frequent changes in the simulation state while still achieving efficient parallelization.

Additional concerns include collision tunneling and balancing accuracy with efficiency.<\p>

We plan to begin with a simple parallel cloth simulation code base—likely an open-source project that already supports basic multi-threading—so that we can focus our efforts on implementing and evaluating our own dynamic partitioning and task allocation strategies. We will run and test our code on standard multi-core CPUs and potentially on a GPU-equipped workstation if it proves beneficial for accelerating certain aspects of the simulation. As a reference, we will draw on established literature in cloth simulation, such as Baraff and Witkin’s seminal paper “Large Steps in Cloth Simulation” (Baraff, D., & Witkin, A. (1998). Large steps in cloth simulation. ACM SIGGRAPH), to inform our approaches to collision handling and numerical integration. Beyond these resources, we do not currently anticipate needing access to specialized hardware beyond high-core-count machines or general-purpose GPUs, though we remain open to exploring more advanced HPC clusters if our simulation’s requirements grow.

Plan to Achieve

These are the core objectives that we must accomplish to consider our project successful and meet our expected grading criteria:

Implement and Evaluate Dynamic Partitioning Strategies
- Integrate dynamic partitioning and task allocation methods into an existing parallel cloth simulation framework.
- Compare our approach against static partitioning to demonstrate performance benefits.
Collision Handling and Numerical Integration
- Implement efficient collision detection and response informed by Baraff and Witkin’s "Large Steps in Cloth Simulation."
- Optimize numerical integration techniques to ensure simulation stability at larger time steps.
Quantitative Performance Metrics
- Establish baseline performance metrics using the original simulation.
- Measure speedup, efficiency, and accuracy under different workload distributions.
Documentation and Code Organization
- Ensure the codebase is well-documented for reproducibility.
- Provide clear experimental methodologies and justifications.

Hope to Achieve

If the project progresses ahead of schedule, we aim to explore additional enhancements:

GPU Acceleration
- Investigate GPU offloading to accelerate computationally expensive simulation steps.
Adaptive Workload Distribution
- Implement a feedback mechanism that dynamically adjusts task allocation based on real-time performance metrics.
- Evaluate the benefits of adaptive workload balancing in varying simulation scenarios.

Live Demo

If feasible, we plan to showcase a live demo during the poster session:

A side-by-side comparison of different task allocation strategies visualized in real-time.
Performance charts illustrating speedup and efficiency improvements.
Interactive tuning of simulation parameters to highlight dynamic workload adjustments.

Key Questions to Address

How effectively does dynamic partitioning improve performance over static partitioning?
What are the primary bottlenecks in parallel cloth simulation on multi-core CPUs?
Can GPU acceleration provide a significant advantage for computationally intensive tasks?
How does the choice of partitioning strategy impact collision detection accuracy and numerical stability?

System Capabilities and Performance Goals

The system will support real-time or near-real-time cloth simulation using parallel processing.
We hope to achieve at least a 5x speedup over the baseline sequential implementation on a high-core-count CPU.
If GPU acceleration is explored, we aim for an additional 2x-3x speedup in specific workloads such as numerical integration and collision handling.

Platform Choice

We have chosen a standard multi-core CPU platform (with optional GPU acceleration) and a C++-based simulation framework because this setup aligns well with our parallelization goals and performance requirements. (Possibly GHC machine or our own computer)

C++ offers efficient low-level control over data structures and memory management, which is critical for irregular workloads like cloth simulation. Multi-core CPUs (and GPUs, if available) provide the parallel computing capabilities we need to divide and distribute the simulation tasks across multiple threads or compute units. Additionally, existing libraries and parallel frameworks (such as OpenMP, TBB, or CUDA) allow us to quickly prototype and optimize various load balancing and synchronization strategies. This combination makes it feasible to implement our dynamic partitioning approach, manage unpredictable collision workloads, and still maintain high performance and scalability.

Week of March 27 – April 2

Finalize code base choice and references
- Select an existing parallel cloth simulation code base to build upon.
- Gather and review key literature (e.g., Baraff & Witkin) to ensure understanding of collision handling and integration approaches.
Set up the development environment
- Configure compilers, libraries (e.g., OpenMP, TBB, or CUDA), and any dependencies needed for the cloth simulation.

Week of April 3 – April 9

Run and analyze the baseline code
- Compile and run the chosen parallel cloth simulation on sample scenes.
- Measure performance metrics (e.g., runtime, scaling) to establish a baseline.
Identify bottlenecks
- Perform a preliminary bottleneck analysis on collision detection, force calculation, and other high-cost operations.
- Draft ideas for dynamic task allocation and partitioning.

Week of April 10 – April 16 (Intermediate Milestone: April 15)

Implement first version of dynamic partitioning
- Introduce sub-region partitioning in the simulation loop.
- Implement a basic load-balancing mechanism (static or straightforward dynamic).
Validate correctness and performance
- Ensure cloth simulation still behaves physically correct.
- Collect initial performance results on test scenes.
- Document progress for the intermediate milestone due on April 15.

Week of April 17 – April 23

Refine and optimize the dynamic strategy
- Adjust partitioning granularity and update intervals. Try different cloth parameters and resolution.
- Improve collision handling and memory access patterns (e.g., bounding volume hierarchies, spatial hashing).
- Apply partial grid update method. Try dynamic work queue.
Conduct thorough performance testing
- Test on different cloth sizes and irregular surfaces.
- Compare improvements against the baseline in terms of speed, scalability, and load balance.

Week of April 24 – April 29 (Poster Due: April 29)

Finalize results and prepare the poster
- Summarize performance gains, scalability charts, and key findings.
- Highlight any remaining challenges or potential improvements for future work.
Present and submit the project poster
- Create a concise narrative of the work, focusing on the parallelization strategy, results, and lessons learned.
- Polish visualizations, diagrams, and data to include in the poster.