12.Medical Health Insurance Premium Prediction

Nihar javiya
4 min readOct 28, 2021


How AWS sagemaker works?

SageMaker is a fully managed service that enables you to quickly and easily integrate machine learning-based models into your applications.

1)work with sagemaker studio

2)Explore,analyze and process data

3)Train a model with Amazon sagemaker

4)Deploy a model in amazon sagemaker

1] Linear Regression: Linear regression is one the simplest and most well liked Machine Learning algorithms. simple regression makes predictions for continuous/real or numeric variables like sales, salary, age, product worth, etc.

Linear regression formula shows a linear relationship between a dependent (y) and one or additional freelance (y) variables, therefore known as simple regression. Since simple regression shows the linear relationship, which implies it finds however the worth of the variable quantity is dynamical in keeping with the worth of the variable.

2]Artificial Neural Network: An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. It is the foundation of AI and solves problems that would prove impossible or difficult by human or statistical standards. ANNs have self-learning capabilities that enable them to produce better results as more data becomes available.


In this project first of all I have take the one insurance dataset from kaggle. This dataset contain the various feature like the age, sex, bmi, children, smoke, region and charge. After that I have analyze the our data. And them I have check how many null value in this dataset and try to remove that value. Now to see the relation ship between region and charges I apply the groupby method and it show that south west region have more charge value and bmi. In machine learning model string value are not applicable so We must need to convert that value in numerical form. In this dataset smoke, sex and region value are in the form of string . So with use of lambda function I converted sex and somke in numeric. If the sex is male then value is 1 and 0 if female. In this way I have convet somke value. After that for region I have create dummy variable for region.

Now after I applied the concept of matplotlib to see the total value of each featuee. And regression plot as the name suggests creates a regression line between 2 parameters and helps to visualize their linear relationships. And after that We have to apply the algorithm to see how much our model is accurate. To predict the model I drop the charge value and predict that value with the use of categorical feature and after applying the linear regression model I get the accuracy of 77%.And I applied the neural network model also and in this model our accuracy if 84%. After run the model in local machine I have run the model with sagemaker and try to upload my training data in amazon S3.

Import dataset
regression plot between age and charge
train test split dataset
accuracy using learner regression
ANN model
Accuracy using ANN
import training dataset in amazon S3
Traing model with AWS sgemaker
Output and train two sub folder for data set

Github link



Nihar javiya

Recommended from Medium


See more recommendations