Harvard Data Science Initiative (HDSI) & Faculty of Arts and Sciences (FAS)
Running Jupyter Notebooks on the Harvard RC Cluster
For computationally intensive work, you can run Jupyter notebooks on Harvard's
Research Computing (RC) cluster, which provides powerful hardware and integrates
with AI tools through Harvard's HUIT Bedrock proxy.
What Is the RC Cluster?
Harvard Research Computing provides a high-performance computing cluster
(also known as Cannon/FASRC) with:
Powerful CPU and GPU resources for large-scale computation
Large-scale data storage including scratch and lab-specific filesystems
Pre-installed scientific software and module system
Jupyter notebook access through the Open OnDemand web portal
Important: Never commit your API key to code repositories.
Keep it in environment variables or secure configuration files that are
excluded from version control.
Example: AI-Assisted Data Analysis on RC
This example shows how to use Claude from within a Jupyter notebook running
on the RC cluster to get help analyzing a large dataset:
# In your RC Jupyter notebook
import anthropic
import pandas as pd
import numpy as np
import os
# Load large dataset (stored on RC cluster)
data = pd.read_csv('/n/holyscratch01/your_lab/large_dataset.csv')
# Initialize client with Harvard endpoint
client = anthropic.AnthropicBedrock(
aws_region="us-east-1" # Required parameter
)
prompt = f"""
I have a dataset with {len(data)} rows and columns: {list(data.columns)}.
Write Python code to:
1. Calculate correlation matrix
2. Identify top 5 most correlated variable pairs
3. Create a heatmap visualization
"""
response = client.messages.create(
model="us.anthropic.claude-opus-4-5-20251101-v1:0",
max_tokens=2000,
messages=[{"role": "user", "content": prompt}]
)
print(response.content[0].text)
# Then run the generated code
Why Use the RC Cluster?
Computational power: Handle large datasets that won't fit on your laptop
Parallel processing: Run multiple analyses simultaneously across many cores
GPU access: For machine learning and deep learning workloads that need acceleration
Persistent storage: Keep large datasets and results on cluster filesystems
Long-running jobs: Leave analyses running without keeping your laptop on
Cost and Security Notes
Important Cost and Security Notes
RC cluster usage is billed to PI accounts through FAS-RC
API usage is billed to PI accounts through HUIT
Coordinate with your advisor about appropriate resource usage for both
Never commit API keys to code repositories — use environment variables
Set monthly spending limits when registering your HUIT API access
Resources
RC DocumentationComprehensive documentation for Harvard's Research Computing cluster
Jupyter on RC GuideOfficial guide for running Jupyter notebooks on the RC cluster