So far, we have directly entered data and created a DataFrame and learned how to analyse data in a DataFrame. However, in actual scenarios, data need not be typed or copy pasted everytime. Rather, data is available most of the time in a file (text or csv) or in a database. Thus, in real-world scenarios, we will be required to bring data directly from a database and load to a DataFrame. This is called importing data from a database. Likewise, after analysis, we will be required to store data back to a database. This is called exporting data to a database.
Data from DataFrame can be read from and written to MySQL database. To do this, a connection is required with the MySQL database using the pymysql database driver.
And for this, the driver should be installed in the python environment using the following command:
pip install pymysql
sqlalchemy is a library used to interact with the MySQL database by providing the required credentials.
This library can be installed using the following command:
Once it is installed, sqlalchemy provides a function create_engine() that enables this connection to be established. The string inside the function is known as connection string. The connection string is composed of multiple parameters like the name of the database with which we want to establish the connection, username, password, host, port number and finally the name of the database. And, this function returns an engine object based on this connection string.
The syntax for the same is discussed below:
engine=create_engine('driver://
username:password@host:port/name_of_
database',index=false)
where,
Driver = mysql+pymysql
username=User name of the mysql (normally it is root)
password= Password of the MySql
port = usually we connect to localhost with port number
3306 (Default port number)
Name of the Database = Your database
In the following subsections, importing and exporting data between Pandas and MySQL applications are demonstrated.
mysql> use CARSHOWROOM ;
Database changed
mysql> select * from INVENTORY;
Importing Data from MySQL to Pandas
Importing data from MySQL to pandas basically refers to the process of reading a table from MySQL database and loading it to a pandas DataFrame. After establishing the connection, in order to fetch data from the table of the database we have the following three functions
1) pandas.read_sql_query(query,sql_conn)
It is used to read an sql query (query) into a DataFrame using the connection identifier (sql_ conn) returned from the create_engine ().
2) pandas.read_sql_table(table_name,sql_conn)
It is used to read an sql table (table_name) into a DataFrame using the connection identifier (sql_ conn).
3) pandas.read_sql(sql, sql_conn)
It is used to read either an sql query or an sql table (sql) into a DataFrame using the connection identifier (sql_conn).
>>> import pandas as pd
>>> import pymysql as py
>>> import sqlalchemy
>>> engine=create_engine('mysql+pymysql://
root:smsmb@localhost:3306/CARSHOWROOM')
>>> df = pd.read_sql_query('SELECT * FROM
INVENTORY', engine)
>>> print(df)
Exporting Data from Pandas to MySQL
Exporting data from Pandas to MySQL basically refers to the process of writing a pandas DataFrame to a table of MySQL database.
For this purpose, we have the following function:
pandas.DataFrame.to_sql(table,sql_conn,if_
exists=”fail”,index=False/True)
• Table specifies the name of the table in which we want to create or append DataFrame values. It is used to write the specified DataFrame to the table the connection identifier (sql_conn) returned from the create_engine ().
• The parameter if_exists specifies “the way data from the DataFrame should be entered in the table. It can have the following three values: “fail”, “replace”, “append”.
“fail” is the default value that indicates a ValueError if the table already exists in the database.
“replace” specifies that the previous content of the table should be updated by the contents of the DataFrame.
“append” specifies that the contents of the DataFrame should be appended to the existing table and when updated the format must be the same (column name sequences).
Index — By default index is True means DataFrame index will be copied to MySQL table. If False, then it will ignore the DataFrame indexing.
#Code to write DataFrame df to database
>>> import pandas as pd
>>> import pymysql as py
>>> import sqlalchemy
>>> engine=create_engine('mysql+pymysql:// root:smsmb@localhost:3306/CARSHOWROOM')
>>> data={
'ShowRoomId':[1,2,3,4,5],
‘Location':[‘Delhi','Bangalore','Mumbai','Chand igarh','Kerala']}
>>> df=pd.DataFrame(data)
>>> df.to_sql('showroom_info',engine,if_
exists="replace",index=False)
After running this python script, a mysql table with the name “showroom_info” will be created in the database.