Generative AI for Scholarship

Harvard Data Science Initiative (HDSI) & Faculty of Arts and Sciences (FAS)

Running Jupyter Notebooks on the Harvard RC Cluster

For computationally intensive work, you can run Jupyter notebooks on Harvard's Research Computing (RC) cluster, which provides powerful hardware and integrates with AI tools through Harvard's HUIT Bedrock proxy.

What Is the RC Cluster?

Harvard Research Computing provides a high-performance computing cluster (also known as Cannon/FASRC) with:

Powerful CPU and GPU resources for large-scale computation
Large-scale data storage including scratch and lab-specific filesystems
Pre-installed scientific software and module system
Jupyter notebook access through the Open OnDemand web portal

Prerequisites

Harvard RC account — request one at rc.fas.harvard.edu/account-request
VPN connection if accessing from off-campus (vpn.harvard.edu)
Basic familiarity with Linux command line
Harvard HUIT API key (if using AI integration) — see API setup guide

Accessing Jupyter on the RC Cluster

Log in to Open OnDemand:
- Go to rcood.rc.fas.harvard.edu
- Log in with your Harvard credentials
Launch a Jupyter notebook session:
- Click "Interactive Apps" → "Jupyter Notebook"
- Select a partition (e.g., "test" for quick jobs, "shared" for longer runs)
- Request resources (number of CPUs, memory, time limit)
- Click Launch and wait for the session to start
Connect to your notebook:
- Once the session is running, click "Connect to Jupyter"
- This opens JupyterLab (or classic Jupyter) in your browser

Setting Up AI Integration on RC

To use Claude from within your RC Jupyter notebooks, you need to install the Anthropic library and configure your API credentials.

Install the Anthropic library

In a terminal on the RC cluster (or in a notebook cell with !):

pip install --user anthropic

Configure your API key

Add the following to your ~/.bashrc on the RC cluster:

export ANTHROPIC_BEDROCK_BASE_URL=https://apis.huit.harvard.edu/ais-bedrock-llm/v2
export ANTHROPIC_API_KEY="your-harvard-api-key"

Then reload your shell configuration:

source ~/.bashrc

Important: Never commit your API key to code repositories. Keep it in environment variables or secure configuration files that are excluded from version control.

Example: AI-Assisted Data Analysis on RC

This example shows how to use Claude from within a Jupyter notebook running on the RC cluster to get help analyzing a large dataset:

# In your RC Jupyter notebook
import anthropic
import pandas as pd
import numpy as np
import os

# Load large dataset (stored on RC cluster)
data = pd.read_csv('/n/holyscratch01/your_lab/large_dataset.csv')

# Initialize client with Harvard endpoint
client = anthropic.AnthropicBedrock(
    aws_region="us-east-1"  # Required parameter
)

prompt = f"""
I have a dataset with {len(data)} rows and columns: {list(data.columns)}.
Write Python code to:
1. Calculate correlation matrix
2. Identify top 5 most correlated variable pairs
3. Create a heatmap visualization
"""

response = client.messages.create(
    model="us.anthropic.claude-opus-4-5-20251101-v1:0",
    max_tokens=2000,
    messages=[{"role": "user", "content": prompt}]
)

print(response.content[0].text)
# Then run the generated code

Why Use the RC Cluster?

Computational power: Handle large datasets that won't fit on your laptop
Parallel processing: Run multiple analyses simultaneously across many cores
GPU access: For machine learning and deep learning workloads that need acceleration
Persistent storage: Keep large datasets and results on cluster filesystems
Long-running jobs: Leave analyses running without keeping your laptop on

Cost and Security Notes

Important Cost and Security Notes

RC cluster usage is billed to PI accounts through FAS-RC
API usage is billed to PI accounts through HUIT
Coordinate with your advisor about appropriate resource usage for both
Never commit API keys to code repositories — use environment variables
Set monthly spending limits when registering your HUIT API access

Resources

RC Documentation Comprehensive documentation for Harvard's Research Computing cluster
Jupyter on RC Guide Official guide for running Jupyter notebooks on the RC cluster
RC Training Materials Workshops and tutorials from the Harvard RC team
Setting Up Anthropic API Access (Harvard) How to obtain and configure your HUIT API key