Energy Country Review: Complimentary 7-day trial

  • News-alert sign up
  • Contact us

The Data Science Journey (so far..) of a Seasoned Geoscientist

10/08/2021

Ron Daniel CGeol FGS, Evaluation Team Lead, Exploration, at Staatsolie Maatschappij Suriname N.V

African proverb: (s)he who goes in search of honey must expect to get stung by bees.

 The Backdrop

When my contract ended on 31st March 2020 and with no immediate prospect of work due to the coincidence of Covid and another oil price crash, I did what I have always done at such times; I considered my options. After 35 years as a petroleum geoscientist my curiosity and love of the subject were undimmed but at age 56, employment prospects were possibly diminishing. With a passion for coaching and mentoring, I quickly signed up with two outfits as a tutor in Spanish and geoscience, and with an interest in geothermal energy sparked by an actively volcanic Caribbean island as my birthplace, I attended the PIVOT conference. As well as making the most of the hour of daily outdoor exercise permitted by Covid regulations, I built several projects in Paleoscan, a global seismic interpretation software that I had used previously and thought very highly of. There I saw a potential niche consultancy business for me and my limited company, Lions Denergy (www.lionsdenergy.com ).

By early September, having filled my boots with Paleoscan projects, a friend mentioned that he had signed up for a machine learning course. I was curious but felt that it would be better for me to explore the wider canvass of data science (DS), since I had used data in my geoscience work and I could always focus on one particular aspect of DS later. DS could either complement future geoscience analysis for me or lead to a complete change of direction out of the geoscience / hydrocarbon sector.

Course Selection

A Google search led me to the IBM/Coursera Data Science Professional Certificate, with Python as the coding language. The thought of having to code immediately struck fear into me because my ‘coding’ experience was limited to writing basic formulas in Excel and a computer aptitude test during my undergraduate degree has judged me to be unsuitable! However, I took heart from the fact that I had singlehandedly built a decent website for Lions Denergy in April 2020, something I never thought myself capable of. I had only heard of Python because Eliis, the vendors of Paleoscan, had added a Python module to the most recent version of their software.

I chose the IBM/Coursera course because it looked well structured, was relatively cheap (£30/month) and with abundant time on my hands, I hoped to complete it in about three months.

To avoid being held up (time and money) by my lack of coding skills, in October I did Mike Daines’ excellent Beginner’s Introduction to Python course (coding in Pycharm, 4 ½ hours, https://www.youtube.com/watch?v=rfscVS0vtbw) and Jovian ML’s Data Analysis with Python course (70 minutes, https://www.youtube.com/watch?v=EsDFiZPljYo), both on YouTube.

 IBM/Coursera Data Science Professional Certificate

I signed up for the course on 3rd November and finished it on 8th April, after a total of 332 hours spent. I generally did 3-4 hours per day, Monday to Friday initially, but things slowed down when I started a full time geoscience job on 1st March. In order to cement my learning, I read the transcripts, watched the videos, then watched and stopped them to make copious notes by hand. IBM’s guidance quotes 149 hours spread over 11 months (13 hours per month / 3 hours per week).

The course comprises ten modules and moves sequentially from What is Data Science through Tools, Methodology, Python, Databases and SQL, Data Analysis, Data Visualization, Machine Learning, and a Capstone project in which you have to come up with a problem and write the code to solve it. I found the Capstone module very challenging and spent 73 hours on it. A useful course overview is provided by Mehrnaz Siavoshi (Jun 18, 2019) on the internet. As well as the videos, there are multiple choice quizzes during them, practice labs on coding, and labs that have to be submitted for assessment by your peers, using the marking schema provided. You generally need to mark between one and two other students to get your own results. There is a message board where IBM tutors help with queries but the replies from other students were often the most helpful. The coding is done in Jupyter Notebooks which are shared via GitHub and the course is heavy on IBM tools, with their Developer Skills Network, Watson Studio and Cloud also much used.

Overall, I found the course to be well structured, although some of the material was out of data, for example, the version of some of the tools, which sometimes caused code to crash without explanation. When I (thought I had) finished the course I discovered that an additional module had been added (Python Project for Data Science) after I started. This module would have saved me much angst during the Capstone project and it was also difficult to find out how to get through to a real person to get my final certificate, without having to do the extra module (I eventually did it anyway, out of curiosity).

When you have passed the course, you become an alumnus and eligible to access the Coursera Career Service, which includes many useful tools for writing and honing your CV, making job applications and general networking etc.

There were times when I felt stuck and despaired because I did not know enough to understand where I had gone wrong or why my code was not running. I am therefore very grateful to all those who helped me to get past those blockages. I was stung by plenty of bees but I found a lot of honey!

Where Next?

The IBM/Coursera course definitely gave me an excellent grounding in data science methodology and application, although I would shy away from calling myself a data scientist just yet. And I learnt to code for the first time in my life! Part of the reason the course took five months to complete, was that as well as starting a full time geoscience job before I finished it, I attended the Data Science Night School run in February 2021 by the PESGB (Petroleum Exploration Society of Great Britain). This used some more up to date tools than the Coursera course and open source UKCS well data, so was very relevant to my day job. The combination of the two courses has really opened my eyes to where and how data science is being used in geoscience applications and software, and where I might be able to use my skills to code and solve geoscience problems.

While I remain employed in geoscience, my plan is to keep my data science skills current by finding the time to use them to solve problems (not necessarily geoscience ones), similar to the concept of the Capstone project.

I am happy to discuss my experiences with anyone interested so feel free to contact me.

KeyFacts Energy Industry Directory: Lions Denergy

< Previous Next >