Skip to main content

Command Palette

Search for a command to run...

Started my Project on Energy-Efficient DevOps: Auto-Suspending Idle AWS Resources using ML

Published
3 min read

Week 2 – ML Modeling Phase (Log Extraction & Model Training)

Objective:

To extract the logged CPU and workload data from the EC2 instance, preprocess it, label idle/active states, and train a machine learning model to detect idle periods.


What I Did Today:

Stopped Data Collection

After 3 full days of continuous logging using cron, I safely downloaded the log files from EC2 to my local machine for analysis:

  • cpu_log.csv

  • workload_log.txt

Used SCP or manual download via EC2 session to transfer the files.

Preprocessed the Data

Parsed the CPU logs and converted timestamps. Added engineered features:

  • Rolling average of CPU %

  • Hour of day

  • Minute

  • Is weekend/weekday

Labeled each row as Idle (1) or Active (0) based on a CPU usage threshold (e.g., 20%).

Trained an ML Model

Used a Decision Tree Classifier from scikit-learn. Steps included:

  • Feature scaling (if needed)

  • Train-test split (80/20)

  • Model fitting

Evaluated Performance

Used:

  • Confusion Matrix

  • Accuracy Score

  • Classification Report

Also visualized feature importance to understand what influences predictions most.


You can use Google Colab Notebook — ML Summary

  1. Load CPU Log
    → Loaded the cpu_log.csv file using pandas and parsed the timestamp column for time-based analysis

    1. Visualize CPU Usage
      → Created line plots of CPU usage over time to detect workload and idle patterns visually.

      1. Label Idle States
        → Labeled each row as Idle (1) or Active (0) based on whether CPU usage was below a threshold (e.g., 20%).

        1. Feature Engineering
          → Added features like rolling averages, hour, and minute to enrich the dataset for ML training.

          1. Train ML Model
            → Used a Decision Tree Classifier from scikit-learn to train on the labeled and feature-enhanced dataset.

          2. Evaluate Model
            → Assessed model performance using accuracy, confusion matrix, and classification report.

            1. Feature Importance
              → Visualized the most influential features in the prediction using bar plots.


Problems Faced:

  • SSH connection to EC2 timed out temporarily ( 1.verify the public ip address from the command in the powershell and the one on your aws instance is the same, 2. configure security groups of those instances so that we can connect from your ip- inbound rules)

  • Crontab logs needed manual checking to verify that logging was consistent.

  • Had to clean missing/garbled lines in cpu_log.csv before modeling.

Tags:

#DevOps #MachineLearning #AWS #Python #Colab #CloudOptimization #cron #BuildInPublic

Want to Follow Along?

I’ll be sharing weekly progress — issues, logs, architecture, and ML models.

If you've solved similar problems (like automated cloud optimization), I’d love to hear your insight.


More from this blog

Sahil

12 posts