阿里云主机折上折
  • 微信号
Current Site:Index > Data transformation and preprocessing

Data transformation and preprocessing

Author:Chuan Chen 阅读数:17971人阅读 分类: ECharts

Data Transformation and Preprocessing

In the process of data visualization, raw data often cannot be directly used for chart rendering. ECharts, as a popular visualization library, provides robust data processing capabilities. From data cleaning to format conversion, and then to aggregation calculations, each step impacts the final presentation.

Data Format Standardization

ECharts supports multiple data formats but recommends using key-value pair arrays. Raw CSV or JSON data often requires conversion:

// Raw data
const rawData = [
  { year: '2020', sales: 1250 },
  { year: '2021', sales: 1870 },
  { year: '2022', sales: 2100 }
];

// Convert to ECharts-compatible format
const chartData = {
  xAxis: rawData.map(item => item.year),
  series: [{
    data: rawData.map(item => item.sales)
  }]
};

Time data requires special attention to format consistency. Use moment.js or native Date objects to handle time formats:

const timeData = [
  { date: '2023-01', value: 42 },
  { date: '2023-02', value: 78 }
];

// Convert to timestamp format
const processedData = timeData.map(item => ({
  ...item,
  date: new Date(item.date + '-01').getTime()
}));

Data Cleaning and Filtering

Outlier handling is a critical aspect of data preprocessing. Filter unreasonable data by setting thresholds:

const dirtyData = [12, 45, 999, 32, -5, 28];

// Filter values outside the 0-100 range
const cleanData = dirtyData.filter(
  value => value >= 0 && value <= 100
);

Missing value handling can use interpolation methods. Linear interpolation example:

const incompleteData = [
  { x: 1, y: 10 },
  { x: 2, y: null },
  { x: 3, y: 30 }
];

// Fill missing values with linear interpolation
for (let i = 1; i < incompleteData.length - 1; i++) {
  if (incompleteData[i].y === null) {
    incompleteData[i].y = 
      (incompleteData[i-1].y + incompleteData[i+1].y) / 2;
  }
}

Data Aggregation and Grouping

Large datasets require aggregation. Typical example of time-based aggregation:

const dailyData = [
  { date: '2023-01-01', category: 'A', value: 10 },
  { date: '2023-01-01', category: 'B', value: 20 },
  // ...more data
];

// Aggregate by month
const monthlyData = dailyData.reduce((acc, curr) => {
  const month = curr.date.substring(0, 7);
  if (!acc[month]) acc[month] = 0;
  acc[month] += curr.value;
  return acc;
}, {});

// Convert to ECharts format
const seriesData = Object.entries(monthlyData).map(
  ([month, value]) => [month, value]
);

Categorical data grouping statistics:

const products = [
  { category: 'Electronics', price: 999 },
  { category: 'Clothing', price: 199 },
  // ...more products
];

// Group by category and calculate average price
const categoryStats = products.reduce((acc, product) => {
  if (!acc[product.category]) {
    acc[product.category] = { sum: 0, count: 0 };
  }
  acc[product.category].sum += product.price;
  acc[product.category].count++;
  return acc;
}, {});

// Calculate averages
const result = Object.entries(categoryStats).map(
  ([category, stats]) => ({
    category,
    avgPrice: stats.sum / stats.count
  })
);

Data Mapping and Transformation

Mapping raw values to visual properties is a common requirement. Color mapping example:

const temperatureData = [12, 18, 25, 30, 15];

// Temperature-to-color mapping function
function tempToColor(temp) {
  if (temp < 15) return '#3498db';  // Cold
  if (temp < 25) return '#2ecc71';  // Comfortable
  return '#e74c3c';                // Hot
}

const coloredData = temperatureData.map(temp => ({
  value: temp,
  itemStyle: { color: tempToColor(temp) }
}));

Value range normalization:

const rawValues = [50, 120, 80, 200];

// Normalize to 0-1 range
const max = Math.max(...rawValues);
const normalized = rawValues.map(v => v / max);

// Map to specific range (e.g., 50-200 pixels)
const range = [50, 200];
const finalValues = normalized.map(
  v => range[0] + v * (range[1] - range[0])
);

Time Series Processing

Time data requires special transformations. Weekly data to calendar coordinates:

const weekData = [
  { day: 'Mon', value: 12 },
  { day: 'Tue', value: 19 },
  // ...complete week data
];

// Convert to calendar coordinate system format
const calendarData = weekData.map((item, index) => [
  index,        // x-axis coordinate
  item.value,   // y-axis value
  item.day      // Display label
]);

Handling discontinuous time series:

const sparseTimeData = [
  { time: '2023-01', value: 10 },
  { time: '2023-03', value: 20 },
  { time: '2023-06', value: 15 }
];

// Fill missing monthly data
const allMonths = ['01','02','03','04','05','06'].map(m => `2023-${m}`);
const completeData = allMonths.map(month => {
  const existing = sparseTimeData.find(d => d.time === month);
  return existing || { time: month, value: 0 };
});

Multidimensional Data Pivoting

Multidimensional data requires dimensionality reduction for display. Using dataset and dimensions configuration:

const multiDimData = [
  { product: 'Phone', region: 'East', sales: 1200 },
  { product: 'Computer', region: 'North', sales: 800 },
  // ...more data
];

option = {
  dataset: {
    source: multiDimData,
    dimensions: ['product', 'region', 'sales']
  },
  series: {
    type: 'bar',
    encode: {
      x: 'product',
      y: 'sales',
      itemName: 'region'
    }
  }
};

Performance Optimization

Large datasets require sampling optimization. Equidistant sampling algorithm:

const largeData = [...Array(10000)].map((_, i) => ({
  x: i,
  y: Math.sin(i / 100)
}));

// Equidistant sampling to retain 100 points
const sampleSize = 100;
const step = Math.floor(largeData.length / sampleSize);
const sampledData = [];
for (let i = 0; i < largeData.length; i += step) {
  sampledData.push(largeData[i]);
}

Incremental data update strategy:

let allData = [...initialData];

// When new data arrives
function handleNewData(newPoints) {
  // Keep the most recent 1000 points
  if (allData.length + newPoints.length > 1000) {
    allData = allData.slice(newPoints.length);
  }
  allData.push(...newPoints);
  
  // Update chart
  myChart.setOption({
    series: [{ data: allData }]
  });
}

Interactive Data Processing

Dynamic data filtering example:

const fullData = [
  { name: 'Beijing', value: 123 },
  { name: 'Shanghai', value: 156 },
  // ...more city data
];

function filterData(minValue) {
  return fullData.filter(item => item.value >= minValue);
}

// Slider interaction
document.getElementById('rangeSlider').addEventListener('input', (e) => {
  const filtered = filterData(parseInt(e.target.value));
  myChart.setOption({
    series: [{ data: filtered }]
  });
});

Geographic Data Transformation

GeoJSON data requires special transformations:

// Extract coordinate boundaries from GeoJSON
function getBounds(features) {
  return features.reduce((bounds, feature) => {
    const [minX, minY, maxX, maxY] = turf.bbox(feature);
    bounds.minX = Math.min(bounds.minX || Infinity, minX);
    bounds.minY = Math.min(bounds.minY || Infinity, minY);
    bounds.maxX = Math.max(bounds.maxX || -Infinity, maxX);
    bounds.maxY = Math.max(bounds.maxY || -Infinity, maxY);
    return bounds;
  }, {});
}

Coordinate transformation example:

// WGS84 to Web Mercator
function wgs84ToMercator(lng, lat) {
  const x = lng * 20037508.34 / 180;
  const y = Math.log(Math.tan((90 + lat) * Math.PI / 360)) / (Math.PI / 180);
  return [x, y * 20037508.34 / 180];
}

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.