Harvard Data Science Initiative Harvard Faculty of Arts and Sciences

Generative AI for Scholarship

Harvard Data Science Initiative (HDSI) & Faculty of Arts and Sciences (FAS)

Running Jupyter Notebooks on the Harvard RC Cluster

For computationally intensive work, you can run Jupyter notebooks on Harvard's Research Computing (RC) cluster, which provides powerful hardware and integrates with AI tools through Harvard's HUIT Bedrock proxy.

What Is the RC Cluster?

Harvard Research Computing provides a high-performance computing cluster (also known as Cannon/FASRC) with:

Prerequisites

Accessing Jupyter on the RC Cluster

  1. Log in to Open OnDemand:
  2. Launch a Jupyter notebook session:
  3. Connect to your notebook:

Setting Up AI Integration on RC

To use Claude from within your RC Jupyter notebooks, you need to install the Anthropic library and configure your API credentials.

Install the Anthropic library

In a terminal on the RC cluster (or in a notebook cell with !):

pip install --user anthropic

Configure your API key

Add the following to your ~/.bashrc on the RC cluster:

export ANTHROPIC_BEDROCK_BASE_URL=https://apis.huit.harvard.edu/ais-bedrock-llm/v2
export ANTHROPIC_API_KEY="your-harvard-api-key"

Then reload your shell configuration:

source ~/.bashrc
Important: Never commit your API key to code repositories. Keep it in environment variables or secure configuration files that are excluded from version control.

Example: AI-Assisted Data Analysis on RC

This example shows how to use Claude from within a Jupyter notebook running on the RC cluster to get help analyzing a large dataset:

# In your RC Jupyter notebook
import anthropic
import pandas as pd
import numpy as np
import os

# Load large dataset (stored on RC cluster)
data = pd.read_csv('/n/holyscratch01/your_lab/large_dataset.csv')

# Initialize client with Harvard endpoint
client = anthropic.AnthropicBedrock(
    aws_region="us-east-1"  # Required parameter
)

prompt = f"""
I have a dataset with {len(data)} rows and columns: {list(data.columns)}.
Write Python code to:
1. Calculate correlation matrix
2. Identify top 5 most correlated variable pairs
3. Create a heatmap visualization
"""

response = client.messages.create(
    model="us.anthropic.claude-opus-4-5-20251101-v1:0",
    max_tokens=2000,
    messages=[{"role": "user", "content": prompt}]
)

print(response.content[0].text)
# Then run the generated code

Why Use the RC Cluster?

Cost and Security Notes

Important Cost and Security Notes

Resources