Expert pandas skill for data manipulation, cleaning, analysis, and transformation. Use this skill when working with tabular data, CSV/Excel files, data analy...
English | 简体中文
This skill provides comprehensive pandas data processing capabilities through executable scripts and reference documentation. Use this skill whenever tasks involve data manipulation, cleaning, analysis, or transformation of tabular data.
Activate this skill when the user requests:
scripts/data_cleaner.py)Handles common data cleaning tasks with a single command:
Usage:
python scripts/data_cleaner.py input.csv output.csv [options]
Available Options:
--remove-duplicates: Remove duplicate rows--handle-missing [strategy]: Handle missing values
drop, fill, forward, backward, mean, median--fill-value [value]: Custom fill value for missing data--remove-outliers: Remove outliers using IQR or Z-score method--outlier-method [method]: Choose iqr or zscore (default: iqr)--standardize-columns: Standardize column names (lowercase, underscores)Example:
python scripts/data_cleaner.py data.csv cleaned_data.csv \
--remove-duplicates \
--handle-missing mean \
--remove-outliers \
--standardize-columns
scripts/data_analyzer.py)Generates comprehensive data analysis reports:
Usage:
python scripts/data_analyzer.py input.csv [options]
Available Options:
--output, -o [file]: Save report to file--format [format]: Output format (json or text, default: json)Report Includes:
Example:
python scripts/data_analyzer.py sales_data.csv -o report.json --format json
scripts/data_transformer.py)Performs various data transformation operations through subcommands:
python scripts/data_transformer.py convert input.csv output.xlsx
Supports: CSV, Excel (.xlsx/.xls), JSON, Parquet, HTML
python scripts/data_transformer.py merge file1.csv file2.csv file3.csv \
--output merged.csv \
--how outer \
--on key_column
python scripts/data_transformer.py filter data.csv \
--query "age > 18 and city == 'Beijing'" \
--output filtered.csv
python scripts/data_transformer.py sort data.csv \
--by sales quantity \
--descending \
--output sorted.csv
python scripts/data_transformer.py select data.csv \
--columns name age city \
--output selected.csv
The references/ directory contains detailed documentation:
references/common_operations.mdComprehensive reference covering:
When to use: When Claude needs to understand pandas syntax or find the right method for a specific operation.
references/data_cleaning_best_practices.mdBest practices guide covering:
When to use: When designing a data cleaning workflow or deciding on the best approach for specific data quality issues.
Always start by analyzing the data:
python scripts/data_analyzer.py input_file.csv -o analysis_report.json
Review the report to understand data quality, types, missing values, and potential issues.
Based on the analysis report:
data_cleaning_best_practices.md)Run the data cleaner with appropriate options:
python scripts/data_cleaner.py input.csv cleaned.csv [options]
Apply any transformations (filtering, sorting, format conversion, merging):
python scripts/data_transformer.py [subcommand] [options]
Re-run analysis on the cleaned data to verify improvements:
python scripts/data_analyzer.py cleaned.csv -o final_report.json
python scripts/data_analyzer.py data.csv --format text
python scripts/data_cleaner.py raw_data.csv clean_data.csv \
--standardize-columns \
--remove-duplicates \
--handle-missing median \
--remove-outliers
# Convert
python scripts/data_transformer.py convert data.xlsx data.csv
# Filter
python scripts/data_transformer.py filter data.csv \
--query "status == 'active'" \
--output filtered.csv
python scripts/data_transformer.py merge *.csv \
--output combined.csv
Ensure pandas is installed:
pip install pandas numpy openpyxl
Optional for specific formats:
pip install pyarrow # For Parquet support
pip install xlrd # For older Excel files (.xls)
Import errors: Ensure pandas and dependencies are installed
Memory errors: Process data in chunks or optimize dtypes (see references)
Encoding issues: Add encoding='utf-8' parameter when loading CSVs
Date parsing issues: Use pd.to_datetime() with explicit format string
For detailed pandas operations and troubleshooting, always refer to references/common_operations.md and references/data_cleaning_best_practices.md.
ZIP package — ready to use