Dataset Details

Get Started with the Dataset

Download the complete dataset and explore comprehensive documentation to start your research.

Overview of HDF5 File Structure
Group	Description (Dimensions)
`action`	Leader joint position data (14,)
`observations`	Sensor observations collected per timestep (group contains depth and image subgroups)
`depth`	Depth camera frames
`dcam_high`	High-resolution depth image (480, 640)
`dcam_low`	Wide-angle depth image (480, 640)
`images`	RGB camera frames
`cam_high`	Overhead RGB (480, 640, 3) - JPEG compressed with OpenCV
`cam_left_wrist`	Left wrist RGB (480, 640, 3) - JPEG compressed with OpenCV
`cam_low`	Wide-angle RGB (480, 640, 3) - JPEG compressed with OpenCV
`cam_right_wrist`	Right wrist RGB (480, 640, 3) - JPEG compressed with OpenCV
`qpos`	Follower joint position data (14,)
`qvel`	Follower joint velocity data (14,)
`text`	Textual metadata (prompts and embeddings)
`prompt`	Text prompt (string)
`text_embedding`	Text embedding vector (384,)

Dataset Usage Example

Below is a simplified example of how to load and process the AIST Bimanual Manipulation Dataset for robot learning applications.

PyTorch Dataset Implementation

Core implementation for loading bimanual manipulation data


import torch
import h5py
import numpy as np
from torch.utils.data import Dataset

class SampleLoader(Dataset):
    """Dataset loader for AIST Bimanual Manipulation data"""
    
    def __init__(self, episodes, camera_names=['cam_high', 'cam_low']):
    self.episodes = episodes
    self.camera_names = camera_names

    self.is_compressed = True
    
    # Load valid sample indices
    self._load_episodes()
    
    def _load_episodes(self):
    """Load episode information and calculate valid samples"""
    self.samples = []
    
    for episode_path in self.episodes:
        with h5py.File(episode_path, 'r') as f:
        episode_len = f['/observations/qpos'].shape[0]
        
        # Ensure we have enough data for observation and action sequences
        min_start = self.obs_horizon - 1
        max_start = episode_len - self.action_horizon
        
        for start_ts in range(min_start, max_start + 1):
            self.samples.append((episode_path, start_ts))
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        episode_path, start_ts = self.samples[idx]
    
        with h5py.File(episode_path, 'r') as f:
            # Load observation sequence (multiple frames)
            obs_start = start_ts - self.obs_horizon + 1
            obs_end = start_ts + 1
            
            # Joint positions and velocities
            qpos = f['/observations/qpos'][obs_start:obs_end]
            qvel = f['/observations/qvel'][obs_start:obs_end]
            
            # Multi-camera images
            images = {}
            for cam in self.camera_names:
                images[cam] = f[f'/observations/images/{cam}'][obs_start:obs_end]
            
            # Decompress
            if self.is_compressed:
                # Decompress images
                for cam_name in images:
                    decompressed_image = []
                    for img_compressed in images[cam_name]:
                        decompressed_img = cv2.imdecode(img_compressed, 1)
                        decompressed_image.append(np.array(decompressed_img))
                    images[cam_name] = np.array(decompressed_image)

            
            # Action sequence
            actions = f['/action'][start_ts:start_ts + self.action_horizon]
            
            # Task description (if available)
            task_prompt = f['/text/prompt'][()].decode('utf-8')
    
    return {
        'qpos': torch.tensor(qpos, dtype=torch.float32),
        'qvel': torch.tensor(qvel, dtype=torch.float32), 
        'images': {k: torch.tensor(v, dtype=torch.float32) / 255.0 
                for k, v in images.items()},
        'actions': torch.tensor(actions, dtype=torch.float32),
        'task_prompt': task_prompt
    }

# Usage example
dataset = SampleLoader(
    episodes=['episode_001.hdf5', 'episode_002.hdf5'],
    camera_names=['cam_high', 'cam_low', 'cam_left_wrist', 'cam_right_wrist']
)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

Key Implementation Notes

Temporal Sequences:

Use obs_horizon to stack multiple observation frames for temporal understanding

Multi-Camera Data:

Process multiple camera views simultaneously for comprehensive spatial understanding

Action Sequences:

Load action chunks for trajectory prediction and policy learning

Task Context:

Include text prompts and embeddings for language-conditioned learning

Key Dataset Features

Advanced Bimanual Tasks: 117 episodes with natural human-like manipulation strategies

Multi-View Visual Data: 4-camera synchronized recording (480×640, 30 FPS) for robust visual learning

Precise Motion Tracking: 14-DoF joint data at 50 Hz with synchronized gripper states

Research-Ready Format: HDF5, RMB, RLDS (Comming soon), LeRobot (Comming soon) compatible with standard APIs

Diverse Skill Levels: Basic to advanced complexity enabling robot learning approaches

Get Started with the Dataset

Dataset Usage Example

PyTorch Dataset Implementation

Key Implementation Notes

Key Dataset Features

Dataset Growth Overview