Rename columns in pandas data-frame

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

We know for selecting a … in a pandas data-frame we need to use bracket notation with full name of a column. Sometimes our column name is very long with space. So we need to rename this with another name. We can do this with following pandas commands.

import pandas as pd
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
ufo.columns
Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time',
       'Location'],
      dtype='object')
Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time',
       'Location'],
      dtype='object')
ufo.rename(columns={'Colors Reported' : 'colors_reported',
'Shape Reported' : 'shape_reported'}, inplace=True)

This will rename the old column with new column names.

We can also rename column names without specifying old names. To do so we need to create a python list and replace the old column names.

ufo_cols = ['city', 'colors reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

This will replace all old columns with new columns.

If we have too many columns in a data-frame, we can simply use python replace method replace columns.

Following command will lower case the word and replace spaces with underscore:

ufo.columns = ufo.columns.str.lower().str.replace(' ', '_')

Create new column from Pandas data-frame

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

For data analysis purpose sometimes we need to create a virtual column in existing data-frame. We can do that easily with following commands:

import pandas as pd
ufo = pd.read_table('http://bit.ly/uforeports', sep=',')

or we can use read_csv() method which have a comma separator by default.

ufo = pd.read_csv ('http://bit.ly/uforeports')

ufo.head()
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00

To create new column with concatenate two other column

ufo['Location'] = ufo['City'] +', '+ ufo['State']
ufo.head()
Out[14]:
City Colors Reported Shape Reported State Time Location
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00 Ithaca, NY
1 Willingboro NaN OTHER NJ 6/30/1930 20:00 Willingboro, NJ
2 Holyoke NaN OVAL CO 2/15/1931 14:00 Holyoke, CO
3 Abilene NaN DISK KS 6/1/1931 13:00 Abilene, KS
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00 New York Worlds Fair, NY

Selecting series in a datframe

We know pandas have a most common data structure which is data-frame. We can select some values from a data-frame with some basic commands.

import pandas as pd
ufo = pd.read_table('http://bit.ly/uforeports', sep=',')

or we can use read_csv() method which have a comma separator by default.

ufo = pd.read_csv ('http://bit.ly/uforeports')

ufo.head()
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00

We can select a series with bracket notation

ufo['City']
0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
5                 Valley City

]
We can also concatenate two column with simple python operation.

ufo['City'] +', '+ ufo['State']
0                      Ithaca, NY
1                 Willingboro, NJ
2                     Holyoke, CO
3                     Abilene, KS
4        New York Worlds Fair, NY
5                 Valley City, ND

 

Pandas data structure

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. Pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Series:
A Series is a one-dimensional object similar to an array, list, or column in a table. It will assign a labeled index to each item in the Series. By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.

# create a Series with an arbitrary list
s = pd.Series([7, 'Dhaka', 1.16, -1526, 'Happy City!'])
s
0                7
1            Dhaka
2             1.16
3            -1525
4      Happy City!
dtype: object

We can use dictionary as well, using the keys of the dictionary as its index.

d = {'Rajshahi': 100, 'Dhaka': 130, 'Dinajpur': 90, 'Rangpur': 110,
     'Natore': 45, 'Panchagarh': None}
cities = pd.Series(d)
cities
Dinajpur          90
Dhaka            130
Natore           45
Rajshahi         100
Rangpur          110
Panchagarh       NaN 
dtype: float64

You can use the index to select specific items from the Series …

Continue reading Pandas data structure

Four Python builtin functions we should know

For debugging we use those functions a lot.

type():
The method type() returns the type of the passed variable. If passed variable is dictionary then it would return a dictionary type.

>>> type(1)           
<type 'int'>
>>> li = []
>>> type(li)          
<type 'list'>
>>> import fibo
>>> type(fibo)  
<type 'module'>
>>> import types     
>>> type(fibo) == types.ModuleType
True

dir():
You can use the built-in dir function to list the identifiers that a module defines. The identifiers are the functions, classes and variables defined in that module.

When you supply a module name to the dir() function, it returns the list of the names defined in that module. When no argument is applied to it, it returns the list of names defined in the current module.

>>> li = []
>>> dir(li)           
['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
>>> d = {}
>>> dir(d)            
['clear', 'copy', 'get', 'has_key', 'items', 'keys', 'setdefault', 'update', 'values']
>>> import odbchelper
>>> dir(odbchelper)   
['__builtins__', '__doc__', '__file__', '__name__', 'buildConnectionString']

help():

Help function, name it self defines the usage of this function. This function returns the help related to python module, object  or method if it is called with respective argument but without any argument it will return the help related to currently running programming module.

>>> help(str)

Help on class str in module __builtin__:

class str(basestring)
 |  str(object='') -> string
 |  
 |  Return a nice string representation of the object.
 |  If the argument is a string, the return value is the same object.
 |  
 |  Method resolution order:
 |      str
 |      basestring
 |      object
 |


__doc__ :

Python documentation strings (or docstrings) provide a convenient way of associating documentation with Python modules, functions, classes, and methods. An object’s docstring is defined by including a string constant as the first statement in the object’s definition.

>>> int
<type 'int'>
>>> print int.__doc__
int(x=0) -> int or long
int(x, base=10) -> int or long

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.

If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base.  The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
>>> list
<type 'list'>
>>> print list.__doc__
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
>>> dict
<type 'dict'>
>>> print dict.__doc__
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)

SFTP Command to Transfer Files on Remote Servers

SSH File Transfer Protocol, a network protocol used for secure file transfer over secure shell.

SFTP (Secure File Transfer Protocol) runs over SSH protocol on standard port 22 by default to establish a secure connection. SFTP has been integrated into many GUI tools (FileZilla, WinSCP, FireFTP etc.).

Below are the most used commands for SFTP:

1. Connect to SFTP

To connect with SFTP we can use below commands.

[root@salayhin ~]# sftp salayhin@20.42.230.5

Connecting to 20.42.230.5...
tecmint@20.42.230.5's password:
sftp>

If we are using AWS we can add the pem file location with this command.

[root@salayhin ~]# sftp -i PemFile.pem salayhin@20.42.230.5

Continue reading SFTP Command to Transfer Files on Remote Servers

Python locale error: unsupported locale setting

Sometimes we are getting local error and cannot install packages via pip.

I found a solution to get rid of it.

Run this command from your command line interface.

$ export LC_ALL=C

Output from locale is:

$ locale
LANG=co404
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C