Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2017/18 - Class Log.


Wed 27 Sep 2017: Welcome!

Thurs 28 Sep 2017: Started working with Python by discussing sequences (of statements), iteration (a.k.a. looping), and selection (using "if" to make decisions.

Fri 29 Sep 2017: Looked at Input-Process-Output by working with some live-births data.

Wed 4 Oct 2017: As a number of students were off at Grad Ireland, we ran a tutorial session to review some of the material from last week. See the Playing-With-Big-Words.ipynb notebook for more.

Thurs 5 Oct 2017: Created code to calculate the total_words, total_sentences, and total_syllabes values for the "reading ease" formula. Defined our own function to calculate total_syllabes. See today's transcripts for all the details.

Fri 6 Oct 2017: Finished off the readability example by creating the functions we needed, then put them in a module called easy.py, which we then imported as required.

Wed 11 Oct 2017: Started looking at Python data in detail, with a head-dive into lists.

Thurs 12 Oct 2017: Finished off our discussion of lists, then looked at (lookup-like, and very cool) dictionaries. Also (briefly) looked at tuples and sets. Understanding and using these four data structures is key to being productive with Python and data.

Fri 13 Oct 2017: Looked at namedtuple as well as defaultdict (both from collections), then watched Dave Beazley's PyData Chicago 2016 talk. Distributed the first assignment.

Wed 18 Oct 2017: Spent the first half of the class not only reading from a CSV file, but also writing to it (as well as an EXCEL xlsx file). The second half of the class was spent creating wordclouds (with some help from NLTK).

Thurs 19 Oct 2017: We finished off the WordCloud/NLTK/TextBlob example. A number of students are having problems running the examples on Windows, so we spent the second half of the class discussing and demonstrating Linux as a platform. A decision was made to recommend Linux as the deployment platform for this course (over Windows). Mac OS X users can continue as before (as everything "works fine" on that platform). In an effort to identify future problems earlier, Paul has moved to Ubuntu Linux for all future work related to this course.

Fri 20 Oct 2017: Today was all about XML and JSON data (and a bit more on CSV). These are the "big three" data formats (especially online). There is a fourth, and it's called HTML - which will be discussed in detail next week.

Wed 25 Oct 2017: After some discussion about the assignment, we continued to look at XML and JSON data, and introduced the "requests" library.

Thurs 26 Oct 2017: Combined the "requests" library with BeautifulSoup to scrape some HTML data. Started with a simple example (The Infinite Monkey Cage downloads), then moved onto processing the James Bond webpage from WikiPedia (much more involved example, which we'll finish off tomorrow).

Fri 27 Oct 2017: Concluded the Bond example, and inserted the scraped/extracted data into MariaDB.

Wed 1 Nov 2017: We worked through transferring some CSV data into a MySQL database table.

Wed 8 Nov 2017: Took a breather, and talked about MatchDay, OPTA, XML, JSON, and other things (a real-world example of applying the techniques we're learning in class).

Thurs 9 Nov 2017: Introduced the wonder which is NumPy.

Fri 10 Nov 2017: All hail, pandas! A quick 35 minute introduction to pandas brought a smile or two the everyone's face.

Wed 15 Nov 2017: Creating pandas from existing data (not CSV files). Using lists and dictionaries for same.

Fri 17 Nov 2017: More pandas: manipulating the weather, converting to other formats, and doing a simple plot.


Return to the Courses page.