Now that we have covered the basics of Python, it is time to have a quick revision.
Friday, 30 November 2018
Sunday, 30 September 2018
Lambda, Map and Filter functions
Lambda functions
You might be familiar with the way functions are defined in Python. For example this is a sample function to square a number:
def pow2(a): return a**2
This function would return square of any input number.
pow2(10) #Will correctly return 100 as the output
We can strip of lots of unnecessary details from the pow2 function. The most important part of the function is to return 'num**2' which will square the number. Rest of the details like name, keywords are not adding much to the function.
Initially, the function can be written in a single line as follows:
def pow2(a):return a**2
Then, we can strip of the keywords, replacing them with the single lambda function that would look like:
lambda num: num**2
This above lambda function does the same functionality as our 'pow2' function.
To apply this lambda function to a sequence of items (Python list), we are going to use the map function
Map function
We can apply the regular functions as well as lamdba function using map function.
The regular function to get the individual items in a list squared goes like this:
list_a = [1,3,5,7,11] #define Python list map(pow2,list_a) #Using map function, apply the earlier defined pow2 function to all the items of the list
The output would be a map function at a particular place in a memory.
<map at 0x10cb3e9f390>
But to obtain the output you have to put them into a list. Something like this:
list(map(pow2,list_a)) #Gives an output of [1,9,25,49,121]
Now let us map the lambda function as well:
list(map(lambda num:num*5,list_a)) #Gives an output of [5,15,25,35,55]
Filter function
list(filter(lambda num:num%3 != 0,list_a))
#Filters out elements that are divisible by 3.
#Output is [1,5,7,11] as 3 is filtered out.
Nested filter and map function would look like:
list(filter(lambda num:num%3 != 0,map(lambda num:num*5,list_a)))
So far we have covered some of the basic functionalities in Python in order to prepare you to get started in the world of data science. Next post will be a short one containing a set of Python code / exercise. If you are able to answer most of those questions, then you are good to proceed further. Otherwise, I would recommend you to revise this introductory stuff in a detailed manner before continuing with some of the advanced topics like Numpy and Pandas.
Sunday, 16 September 2018
Dictionaries, Tuples and Sets
Dictionaries
Dictionaries are another important feature in Python. Python dictionaries are pretty much different from other languages' dictionaries. In most of the languages, dictionaries will usually hold only one type of value and will have limited functionality. However, in Python the story is different:
d = {'a':123,'b':345} #Declaring a dictionary with 'a','b' as keys and 123, 345 as values d['a'] #returns 123 d.items() #gives details about items in dict d.keys() #'a', 'b' are returned. Returns key values in dict #Explore other options of dictionary - pop, popitem, get, update, fromkey, etc. d2 = {'a':[1,2,3], 'b':'Vijay'} #Dictionary with different type of objects. 'a' has list and 'b' has string type(d2['a']) #list d2['a'][1] #returns number 2 nested_dict = {'key1':{'key2_in':['abc',100,'def',34]}} #nested dictionary #accessing items within a nested dictionary nested_dict['key1']['key2_in'][1] #returns 100
Tuples
Tuples are similar to lists - they are a collection of python objects, separated by commas. They are similar in indexing, nested objects, repetition, etc. However, one thing that distinguishes them from lists is the fact that they are immutable.
tupe = (1,2,3,4,3,1) #assigning tuple tupe #displays (1,2,3,4,3,1) tupe[0] #displays '1' tupe.index(3) #returns 2 tupe.count(4) #Number of times the number 4 occurs in tuple - 1 tupe[2] = 100 #Errors out. Tuples are immutable tupe_new = (100,200,['a','b'],400) #tuple with combination of items tupe_new[2][1] #returns 'b'
Tuple unpacking
To extract and utilize individual elements in tuple, we have to do something called 'tuple unpacking'. It is a pretty straight-forward process:
#Declare new tuple containing even more tuples unpack = ((123,345),(100,200),(3000,4000)) #Regular for loop, looping through the items of tuple for (a,b) in unpack: print (a,b) #This prints out ############ # 123 345 # 100 200 # 3000 4000
Another example to demonstrate extracting individual elements:
#Declare new tuple containing even more tuples unpack = ((123,345),(100,200),(3000,4000)) #Regular for loop, looping through the items of tuple #Print only one of the item for (a,b) in unpack: print (b) #This prints out ############ # 345 # 200 # 4000
Sets
set_a = {1,2,3,3,1} #Assign and create new set set_a #Gives you output of {1,2,3}. Removes duplicates set_new = {1,3,7,2,1,2,2} #New set set_new #Prints out {1,2,3,7} - the elements will be in order and unique set_b = {2,3,5} set_a - set_b #{1} is the answer set_a.add(100) #Adds new element set_a.intersection(set_b) #{2,3} set_a.union(set_b) #Explore other set operations
Sunday, 9 September 2018
Slicing and dicing - Strings and List
String manipulation in Python is very different from the ones that we have in other programming languages.
Here are few examples of string manipulation in Python:
Strings start from 0 in Python.
Will give you an output of 'T'
The fun starts with ':' symbol
This tells Python to display string from 0 to end of string.
The output for this will be:
Few important slicing and dicing options are given below. Please try it out by yourself all these things and more:
List also has similar slicing and dicing options. However, few key differences between strings and lists are that:
As an additional activity, explore the different methods that are available in the Python String object and compare them with that of the list object.
Here are few examples of string manipulation in Python:
Assignment operation
As we saw in earlier posts, there is no type and length declaration for string. It is a straight-forward assignment operation.
pyStr = 'This is a Python test string'
Strings start from 0 in Python.
pyStr[0]
Will give you an output of 'T'
The ':' symbol
pyStr[0:]
This tells Python to display string from 0 to end of string.
pyStr[0:6]
The output for this will be:
'This i'
Few important slicing and dicing options are given below. Please try it out by yourself all these things and more:
pyStr[::2] #Prints every second letter of the string starting from 0 #'Ti saPto etsrn d' is the output pyStr[::-1] #reverses the string pyStr[1::3] #Starts from 1, prints every 3rd character pyStr[::-2] #reverses string but only every alternate character is considered #'te aotdagit stnhy ish' pyStr = pyStr + ' additional text' #adds more text pyStr[20] = 'V' #does not work. String is not mutable like this. pyStr = pyStr - 'additional text' #Doesn't work either.
List
List also has similar slicing and dicing options. However, few key differences between strings and lists are that:
- Items in the list can be modified by simple assignment operator.
- '.append(obj)' method should be used to add elements to the list.
- List object has methods like 'pop', 'clear', 'index', 'sort', etc. whereas string object has its own set of methods.
list_a = ['apple','banana','mango','jack fruit'] #List assignment. list_a[::1] #Same as string. list_a[::-1] #Reverses and prints a list but does not alter the list. list_a[1] = 'pear' #Works in the list. Replaces 'banana' with 'pear'. list_a.append('fig') #Appends 'fig' to the end of the list. list_a[4] = 'fig' #Does not work. Index out of range.
As an additional activity, explore the different methods that are available in the Python String object and compare them with that of the list object.
Saturday, 1 September 2018
Introduction to Python
Python is a powerful yet easy to learn programming language. This post serves mostly as an introduction to Python as a programming language. In our next few posts, we will explore more about Data Science related aspects of Python. Let's get started from the basics.
I was pretty new to Python. I had never worked on Python before and was always wary of it. But it all changed after I got this book by Irv Kalib from the public library. It explained the basics of Python in a pretty simplistic manner. It gradually increased in complexity as you browsed through. I will try to summarize the entire book as well as my additional sources of basic Python programming as much as possible.
Variables and assignment
In Python, you need not declare the data type of a variable while initializing it. It sort of 'knows' what you are trying to do.
Int
var_int = 123 print('The integer variable is:',(var_int)) print('Type of variable is : ', (type(var_int)))
Will get you an output of:
The integer variable is: 123
Type of variable is : <class 'int'>
Type of variable is : <class 'int'>
Other ways of doing it:
Output:
Notice how Python automatically identifies the new variable as float and not integer.
Float
var2 = 123.0 print('The new variable is:', var2) print('Type of var2 variable is :',(type(var2)))
Output:
The new variable is: 123.0
Type of var2 variable is : <class 'float'> Notice how Python automatically identifies the new variable as float and not integer.
String
Same goes for strings as well:cityName = 'Chennai' print(f'The cityName variable is: {cityName}') print('Type of cityName is : ',(type(cityName)))
The cityName variable is: Chennai Type of cityName is : <class 'str'>
Functions
As with most of the programming languages, Python has 2 types of functions known as built-in functions and user defined functions.
We already saw few built-in functions in action in our earlier section.
In Juypter notebook, you can get function details by pressing 'shift + tab' keys together. It will provide a brief overview about a function describing what arguments it takes, return types, if any, etc. This view can also be expanded to get details about the function from the Python documentation library.
Built-in functions
Some other examples of built-in functions include print(), conversion functions such as int(), float(), etc..
User defined functions
In Python, user-defined functions (UDF) operate a bit differently.
The functions are defined using 'def' keyword. The tab / spaces denote the function block. There are no open or close brackets to define the boundaries. Also, there is no type declaration in the function inputs and return types as well. Based on the input it receives, it will return appropriate values.
For example:
def add2inputs(a,b): input1 = a; input2 = b; return input1 + input2
This does not have any input types. Based on the input given to the function, this will respond accordingly.
Function call
add2inputs(1,3)
Will give you an output of 4.
Whereas,
add2inputs('Python ','Code')
will give you 'Python Code' as output.
One thing to note about Python's functions is that you can also return multiple values
One thing to note about Python's functions is that you can also return multiple values
Here is where things will get little interesting.
We are defining a function that squares the user input.
def square_fn(a): return a*a
Assigning a list of numbers to the variable list_a.
list_a = [1,2,3,4,5,10]
This following line
list_b = [square_fn(num) for num in list_a]
is equivalent to:
for num in list_a: sq_num = square_fn(num) list_b.append(sq_num)
Most places in Python, you will encounter for loop in just a single-line as mentioned above. It is convenient to write and once you get a hold of it, it will be easier to understand as well.
Sunday, 19 August 2018
Python - Anaconda & Jupyter notebooks installation
The first step in getting started in this journey is to learn Python. As mentioned in my earlier post, Python is one of the easiest programming languages to learn. It has lots of features that makes a programmers' life easy.
Conda install
We will be using Anaconda distribution to install latest version of Python and associated ML / Data science related libraries. It is easy to setup in all the environments - Windows / Mac / Linux; and easy to use as well.
Visit the Conda link, download the appropriate version of Anaconda distribution and follow the on-screen instructions to setup and install Python:
Once Conda distribution is installed, you can access Jupyter notebooks via command prompt.
Go to Start menu and type 'Anaconda prompt'. You will get a command prompt.
Within the command prompt type 'Jupyter notebook'
You can open Jupyter notebook from any of the folders in your system. Once you open from a particular folder, you will be able to access / save files from or to that corresponding location. A new browser window will open displaying the contents of the directory. You can also note the version of Python that is installed and create new notebooks accordingly.
Hello world program
Your first hello world program - it is quite straight-forward and easy to execute. Here is the screenshot of the same:
print('Hello World')
You can rename the notebook, setup keyboard shortcuts, save, execute statements, etc. Feel free to explore the notebook interface to get a feel of the working environment.
Explore the documentation for more details.
In the next post, we will start exploring more about Python as a programming language and then further down the line, we will move towards exploring the data science aspects of Python.
Sunday, 12 August 2018
Tracing my steps
My journey to become a Data Scientist began when I started learning Python. Being a database developer for almost 10 years, I have heard different folks conversing about usage of different languages and the demand for the languages in the software market. I noticed that this 'Python' language was getting mentioned more often in those conversations and also it kept coming up as a 'nice to have skill' during the job searches. I have always had a notion that Python will be incredibly difficult to learn, since this is one of the fastest growing languages and one of the most on-demand languages in the market. I could not have been far from the truth.
If you are also thinking along my lines that Python is tough or learning and mastering it is going to take a considerable amount of effort, then now is the time to wipe away all the fears! It is one of the most easiest languages to learn. As with every programming language, Python is also a vast ocean. However, you need not learn everything about it. There are certain parts that we will have to concentrate more - specifically the parts that deal with Data Science / Machine Learning aspects of it. After learning about Python that is relevant to data science, we will move on to the next relevant topic - Statistics.
We all might have learnt the basics of statistics during our schooling - basics like mean, median, mode, etc. However, it is absolutely necessary to brush-up our statistics related knowledge to have a mastery of data science. Along with the basics, we will have to learn about different distributions, F-statistics, p value so on and so forth. To play around with the data, one must have understanding of statistics to know which features are important and how to derive meaningful results out of the available data. Also, it would be helpful if one could understand the mathematics behind the machine learning algorithm. It will help to tune the algorithm and take us to the desired result.
The next step involves studying different machine learning models - regression, KNN, k-means, decision trees and random forest, PCA, recommendor systems, etc.. Python has numerous libraries associated with machine learning; mastering them will be a major milestone in your data science journey.
In addition to the things mentioned above, one has to know big data technologies like (Py)Spark, Sqoop, Scala to deploy the code on to the clusters and perform data analysis or to train and test the models appropriately.
Since I have worked on database related technologies to extract, transform and load data, I did not spend too much time in learning SQL, Big data or associated concepts. However, it is highly advisable to have at least basic knowledge of concepts related to Relational databases such as Oracle / MS SQL as well as Big data technologies such as Hadoop, Spark and Sqoop.
Last but not the least - learning all the above mentioned things will not automatically make you a data scientist. One has to practice these learnings on real-world datasets as well. There are lots of website and forums that provides such impactful, huge, real-world datasets. These can be utilized to train our models and hone our skillset. There are also companies, which use these online platforms as a channel to actively host several competitions and recruit high performing talents through such forums or websites.
Get ready to travel with me on this long and interesting road to data scientist!
Subscribe to:
Posts (Atom)