OpenClaw Skillv1.0.0

File Deduplicator

Michael-laffinby Michael-laffin
Deploy on EasyClawdfrom $14.9/mo

Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.

How to use this skill

OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours — no server setup needed.

  1. Sign up on EasyClawd (2 minutes)
  2. Connect your Telegram bot
  3. Install File Deduplicator from the skills panel
Get started — from $14.9/mo
7stars
2,198downloads
8installs
0comments
1versions

Latest Changelog

Initial release: Find and remove duplicate files intelligently to save storage space

Tags

latest: 1.0.0

Skill Documentation

---
name: file-deduplicator
description: Find and remove duplicate files intelligently. Save storage space, keep your system clean. Perfect for digital hoarders and document management.
metadata:
  {
    "openclaw":
      {
        "version": "1.0.0",
        "author": "Vernox",
        "license": "MIT",
        "tags": ["deduplication", "storage", "cleanup", "file-management", "duplicate", "disk-space"],
        "category": "tools"
      }
  }
---

# File-Deduplicator - Find and Remove Duplicates

**Vernox Utility Skill - Clean up your digital hoard.**

## Overview

File-Deduplicator is an intelligent file duplicate finder and remover. Uses content hashing to identify identical files across directories, then provides options to remove duplicates safely.

## Features

### ✅ Duplicate Detection
- Content-based hashing (MD5) for fast comparison
- Size-based detection (exact match, near match)
- Name-based detection (similar filenames)
- Directory scanning (recursive)
- Exclude patterns (.git, node_modules, etc.)

### ✅ Removal Options
- Auto-delete duplicates (keep newest/oldest)
- Interactive review before deletion
- Move to archive instead of delete
- Preserve permissions and metadata
- Dry-run mode (preview changes)

### ✅ Analysis Tools
- Duplicate count summary
- Space savings estimation
- Largest duplicate files
- Most common duplicate patterns
- Detailed report generation

### ✅ Safety Features
- Confirmation prompts before deletion
- Backup to archive folder
- Size threshold (don't remove huge files by mistake)
- Whitelist important directories
- Undo functionality (log for recovery)

## Installation

```bash
clawhub install file-deduplicator
```

## Quick Start

### Find Duplicates in Directory

```javascript
const result = await findDuplicates({
  directories: ['./documents', './downloads', './projects'],
  options: {
    method: 'content',  // content-based comparison
    includeSubdirs: true
  }
});

console.log(`Found ${result.duplicateCount} duplicate groups`);
console.log(`Potential space savings: ${result.spaceSaved}`);
```

### Remove Duplicates Automatically

```javascript
const result = await removeDuplicates({
  directories: ['./documents', './downloads'],
  options: {
    method: 'content',
    keep: 'newest',  // keep newest, delete oldest
    action: 'delete',  // or 'move' to archive
    autoConfirm: false  // show confirmation for each
  }
});

console.log(`Removed ${result.filesRemoved} duplicates`);
console.log(`Space saved: ${result.spaceSaved}`);
```

### Dry-Run Preview

```javascript
const result = await removeDuplicates({
  directories: ['./documents', './downloads'],
  options: {
    method: 'content',
    keep: 'newest',
    action: 'delete',
    dryRun: true  // Preview without actual deletion
  }
});

console.log('Would remove:');
result.duplicates.forEach((dup, i) => {
  console.log(`${i+1}. ${dup.file}`);
});
```

## Tool Functions

### `findDuplicates`
Find duplicate files across directories.

**Parameters:**
- `directories` (array|string, required): Directory paths to scan
- `options` (object, optional):
  - `method` (string): 'content' | 'size' | 'name' - comparison method
  - `includeSubdirs` (boolean): Scan recursively (default: true)
  - `minSize` (number): Minimum size in bytes (default: 0)
  - `maxSize` (number): Maximum size in bytes (default: 0)
  - `excludePatterns` (array): Glob patterns to exclude (default: ['.git', 'node_modules'])
  - `whitelist` (array): Directories to never scan (default: [])

**Returns:**
- `duplicates` (array): Array of duplicate groups
  - `duplicateCount` (number): Number of duplicate groups found
  - `totalFiles` (number): Total files scanned
  - `scanDuration` (number): Time taken to scan (ms)
  - `spaceWasted` (number): Total bytes wasted by duplicates
  - `spaceSaved` (number): Potential savings if duplicates removed

### `removeDuplicates`
Remove duplicate files based on findings.

**Parameters:**
- `directories` (array|string, required): Same as findDuplicates
- `options` (object, optional):
  - `keep` (string): 'newest' | 'oldest' | 'smallest' | 'largest' - which to keep
  - `action` (string): 'delete' | 'move' | 'archive'
  - `archivePath` (string): Where to move files when action='move'
  - `dryRun` (boolean): Preview without actual action
  - `autoConfirm` (boolean): Auto-confirm deletions
  - `sizeThreshold` (number): Don't remove files larger than this

**Returns:**
- `filesRemoved` (number): Number of files removed/moved
- `spaceSaved` (number): Bytes saved
- `groupsProcessed` (number): Number of duplicate groups handled
- `logPath` (string): Path to action log
- `errors` (array): Any errors encountered

### `analyzeDirectory`
Analyze a single directory for duplicates.

**Parameters:**
- `directory` (string, required): Path to directory
- `options` (object, optional): Same as findDuplicates options

**Returns:**
- `fileCount` (number): Total files in directory
- `totalSize` (number): Total bytes in directory
- `duplicateSize` (number): Bytes in duplicate files
- `duplicateRatio` (number): Percentage of files that are duplicates

## Use Cases

### Digital Hoarder Cleanup
- Find duplicate photos/videos
- Identify wasted storage space
- Remove old duplicates, keep newest
- Clean up download folders

### Document Management
- Find duplicate PDFs, docs, reports
- Keep latest version, archive old versions
- Prevent version confusion
- Reduce backup bloat

### Project Cleanup
- Find duplicate source files
- Remove duplicate build artifacts
- Clean up node_modules duplicates
- Save storage on SSD/HDD

### Backup Optimization
- Find duplicate backup files
- Remove redundant backups
- Identify what's actually duplicated
- Save space on backup drives

## Configuration

### Edit `config.json`:
```json
{
  "detection": {
    "defaultMethod": "content",
    "sizeTolerancePercent": 0,  // exact match only
    "nameSimilarity": 0.7,  // 0-1, lower = more similar
    "includeSubdirs": true
  },
  "removal": {
    "defaultAction": "delete",
    "defaultKeep": "newest",
    "archivePath": "./archive",
    "sizeThreshold": 10485760,  // 10MB threshold
    "autoConfirm": false,
    "dryRunDefault": false
  },
  "exclude": {
    "patterns": [".git", "node_modules", ".vscode", ".idea"],
    "whitelist": ["important", "work", "projects"]
  }
}
```

## Methods

### Content-Based (Recommended)
- Fast MD5 hashing
- Detects exact duplicates regardless of filename
- Works across renamed files
- Perfect for documents, code, archives

### Size-Based
- Compares file sizes
- Faster than content hashing
- Good for media files where content hashing is slow
- Finds near-duplicates (similar but not exact)

### Name-Based
- Compares filenames
- Detects similar named files
- Good for finding version duplicates (file_v1, file_v2)

## Examples

### Find Duplicates in Documents
```javascript
const result = await findDuplicates({
  directories: '~/Documents',
  options: {
    method: 'content',
    includeSubdirs: true
  }
});

console.log(`Found ${result.duplicateCount} duplicate sets`);
result.duplicates.slice(0, 5).forEach((set, i) => {
  console.log(`Set ${i+1}: ${set.files.length} files`);
  console.log(`  Total size: ${set.totalSize} bytes`);
});
```

### Remove Duplicates, Keep Newest
```javascript
const result = await removeDuplicates({
  directories: '~/Documents',
  options: {
    keep: 'newest',
    action: 'delete'
  }
});

console.log(`Removed ${result.filesRemoved} files`);
console.log(`Saved ${result.spaceSaved} bytes`);
```

### Move to Archive Instead of Delete
```javascript
const result = await removeDuplicates({
  directories: '~/Downloads',
  options: {
    keep: 'newest',
    action: 'move',
    archivePath: '~/Documents/Archive'
  }
});

console.log(`Archived ${result.filesRemoved} files`);
console.log(`Safe in: ~/Documents/Archive`);
```

### Dry-Run Preview Changes
```javascript
const result = await removeDuplicates({
  directories: '~/Documents',
  options: {
    dryRun: true  // 
Read full documentation on ClawHub
Security scan, version history, and community comments: view on ClawHub