Introducion:
This project (write a data science blog post) is a part of Udacity Data Scientist Nanodegree Program.
Nowadays you cannot go anywhere without having car ride and sometimes you cannot use taxi or Uber to reach to your destiny so choosing the best car for your essential needs is extremely important.
In this project, I have analyzed data publicly available about Car Evaluation and was downloaded from the UC Irvine Machine Learning Repository, and dataset consists of 1728 rows and 7 columns which are buying, maintenance, doors, persons, luggage booting, safety, and the last column is the class and it has 4 of class names Unacceptability, Acceptability, Good, Very good, were 70.023% of the data are Unacceptability, 22.222% are Acceptability, 3.993% are good and 3.762% are Very good.
So, I have tried to analyze and answer the following questions:
1 - Is the very good car expensive?
2 - Is the safety will be high on the expensive the cars only?
3 - Is the price of cars affect the price of maintenance?
Part 1. Business and Data Understanding and Prepare Data:
The data have not missed values and all data are categorical so I have to re-encode them.
after that, I decided to divide the data into 4 based on the classes so that it will help me to answer my questions.
and here is what I got:
What is really interesting is all the very good cars the buying price of them is low or medium, but the safety of them is high and the maintenance price is also low or medium but some of them are high.
And in the Unacceptability cars the buying price most of them are high and the safety is low and it is the only one have low safety comparing to the other classes, and there is not different in the maintenance price.
The good cars are similar to the very good but the safety is medium more than high.
The Acceptability cars the the buying price most of them are medium and the safety is high, and there is not different in the maintenance price.
Part 2. Data Modeling and Evaluate the Results:
finally, I built a model to help us to evaluate the cars by predicting the best car, I have used all features, and what is really interesting is as you can see in the bar chart below the size of luggage boot is the most important feature help to predict and the last thing is safety.
The model performs well since the accuracy is 0.9326 on the test set.
Conclusion:
At first, I thought that cars with high price mean high safety, but based on the data the cars with a high price most of them are evaluate Unacceptability and the safety is low, and very good cars are low in prices with high safety.
but I think the sample of the data is small so we can not make a decision based on it.
And from your point what affects the evaluation of the cars?
to see more detailed of the analysis with data and code check the Github repository here.




ليست هناك تعليقات:
إرسال تعليق