Python is a dynamically typed language, which means we do not define the type of an object, but that doesn’t mean we can’t declare types as a form of annotation for the sake of documenting our code. The Python docs refer to this as “Type Hints”. See the docs here.
Sometimes, when doing data analysis and building/evaluating machine learning models in Jupyter notebook, we end up writing functions to save ourselves time and frustration. After all why write
(var1 + var2 + var3 + var4 + var5) / 5
when we could just create a reusable function:
Of course with the scientific computing package Numpy we can simply say:
np.mean([10,15,25,30,35,40])
But, what if we don’t know how Numpy’s fancy mean function works or what values we’re supposed to pass to it or how?
Well, in many IDEs you can get a hint as to how the function works. In Jupyter Notebook you can simply hit Shift+Tab with your cursor in the parentheses of the function invocation and as you see in the image on the left we get a quick view of the function signature. If we expand that out we get a fairly sizable explanation of the function signature, what it does, its parameters, return value, etc. etc. Suddenly we realize, “Hey, this Numpy mean function can do quite a bit more than ours.” Our method doesn’t take axis or data type into account whatsoever.
What’s great is that we can do this with our own functions.
As a trivial example, suppose we needed to iterate over several features of a dataset (columns of a dataframe in this example) and see their unique value counts. For me, at least, it would be annoying to type multiple lines of code and look at it all individually. So I’ll make some reusable code instead:
Example:
We can see above in the function definition that we are specifying the columns
argument type as List
and the df
argument type as Pandas.DataFrame
. The -> None
after the argument parentheses denotes that there is no return type — in this case we do not return anything; rather, we will simply print a string to stdout.
So what’s the payoff here?
Well, for starters, we’re commenting our code quite explicitly by providing type hints in the code. If someone else is working in our codebase or Jupyter notebook, there’s zero ambiguity around what our function does.
Secondly, and potentially more useful is the capability of various IDEs and REPLs to display hints and documentation for functions. As we mentioned earlier, Jupyter Notebook (iPython) can show us:
- function signature
- a docstring — in our case we indicate the input parameters and the return value (or other output)
Check out the full code sample (Github Gist ) -> here <-
That’s pretty cool; what else you got?
Well, since you asked you can also define what’s known as a “type alias”.
What we’ve done above is define a new variable Vector
which is assigned as the alias of List[np.int64]
and we denote that in our arguments as: columns: Vector
. This basically tells the developer that columns
is a List
of values that are a numpy int64. We could also just say int
here.
Sidenote:
- IDE = Integrated Development Environment and
- REPL = Read Evaluate Print Loop (i.e. certain programming languages’ interactive shells)