Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2017/18 - Class Log.

Wed 27 Sep 2017: Welcome!

Thurs 28 Sep 2017: Started working with Python by discussing sequences (of statements), iteration (a.k.a. looping), and selection (using "if" to make decisions.

Fri 29 Sep 2017: Looked at Input-Process-Output by working with some live-births data.

Wed 4 Oct 2017: As a number of students were off at Grad Ireland, we ran a tutorial session to review some of the material from last week. See the Playing-With-Big-Words.ipynb notebook for more.

Thurs 5 Oct 2017: Created code to calculate the total_words, total_sentences, and total_syllabes values for the "reading ease" formula. Defined our own function to calculate total_syllabes. See today's transcripts for all the details.

Fri 6 Oct 2017: Finished off the readability example by creating the functions we needed, then put them in a module called, which we then imported as required.

Wed 11 Oct 2017: Started looking at Python data in detail, with a head-dive into lists.

Thurs 12 Oct 2017: Finished off our discussion of lists, then looked at (lookup-like, and very cool) dictionaries. Also (briefly) looked at tuples and sets. Understanding and using these four data structures is key to being productive with Python and data.

Fri 13 Oct 2017: Looked at namedtuple as well as defaultdict (both from collections), then watched Dave Beazley's PyData Chicago 2016 talk. Distributed the first assignment.

Wed 18 Oct 2017: Spent the first half of the class not only reading from a CSV file, but also writing to it (as well as an EXCEL xlsx file). The second half of the class was spent creating wordclouds (with some help from NLTK).

Thurs 19 Oct 2017: We finished off the WordCloud/NLTK/TextBlob example. A number of students are having problems running the examples on Windows, so we spent the second half of the class discussing and demonstrating Linux as a platform. A decision was made to recommend Linux as the deployment platform for this course (over Windows). Mac OS X users can continue as before (as everything "works fine" on that platform). In an effort to identify future problems earlier, Paul has moved to Ubuntu Linux for all future work related to this course.

Fri 20 Oct 2017: Today was all about XML and JSON data (and a bit more on CSV). These are the "big three" data formats (especially online). There is a fourth, and it's called HTML - which will be discussed in detail next week.

Wed 25 Oct 2017: After some discussion about the assignment, we continued to look at XML and JSON data, and introduced the "requests" library.

Thurs 26 Oct 2017: Combined the "requests" library with BeautifulSoup to scrape some HTML data. Started with a simple example (The Infinite Monkey Cage downloads), then moved onto processing the James Bond webpage from WikiPedia (much more involved example, which we'll finish off tomorrow).

Fri 27 Oct 2017: Concluded the Bond example, and inserted the scraped/extracted data into MariaDB.

Wed 1 Nov 2017: We worked through transferring some CSV data into a MySQL database table.

Wed 8 Nov 2017: Took a breather, and talked about MatchDay, OPTA, XML, JSON, and other things (a real-world example of applying the techniques we're learning in class).

Thurs 9 Nov 2017: Introduced the wonder which is NumPy.

Fri 10 Nov 2017: All hail, pandas! A quick 35 minute introduction to pandas brought a smile or two the everyone's face.

Wed 15 Nov 2017: Creating pandas from existing data (not CSV files). Using lists and dictionaries for same.

Fri 17 Nov 2017: More pandas: manipulating the weather, converting to other formats, and doing a simple plot.

Wed 22 Nov 2017: We quickly created a Flask app to deliver the weather data (and it's graphic) to a webpage (via a browser), then - eventually - got it to run on PythonAnywhere (on the cloud).

Thurs 23 Nov 2017: Tutorial session: answering Question 1 from the current assignment. See the "Question 1.ipynb" notebook for all the code.

Fri 24 Nov 2017: Starting looking at Flask web development from first principles.

Wed 29 Nov 2017: More Flask development with data input forms, CSV files on the server, and conversions to HTML (via a pandas dataframe).

Thurs 30 Nov 2017: Learned about Jinja2 templates, then reviewed the work so far on DataFrames.

Fri 1 Dec 2017: More pandas, more weather data, more plots.

Wed 6 Dec 2017: The process of taking Met Eireann's hourly weather reports and converting it into a "growing" CSV file was discussed. The data is on an Institute server, ready and waiting for the final piece of coursework.

Thurs 7 Dec 2017: Used pandas to do some initial processing with the weather data for the hourly reports, then discussed the next (and last) coursework assignment in detail.

Fri 8 Dec 2017: We reviewed (as a class) a list of possible edits/cleanups/etc. for the Food data from assignment 1.

Wed 13 Dec 2017: We discussed sleep apnea and the graphing requirements for the sample data shown. In the AM, we'll look at the Python/matplotlib code used to produce the graphs.

Thurs 14 Dec 2017: Played with some visualisations.

Fri 15 Dec 2017: Last class - final assignment due after Christmas (end of first week of Semester 2).

Return to the Courses page.