[01:44] *** jacketcha has quit IRC (Ping timeout: 252 seconds) [02:20] *** trs80 has quit IRC (Ping timeout: 246 seconds) [03:25] *** Mateon1 has quit IRC (Remote host closed the connection) [03:26] *** Mateon1 has joined #internetarchive.bak [06:56] *** trs80 has joined #internetarchive.bak [07:45] *** trs80 has quit IRC (Ping timeout: 246 seconds) [07:45] *** trs80 has joined #internetarchive.bak [08:18] *** rrn has joined #internetarchive.bak [08:24] *** atomotic has joined #internetarchive.bak [10:11] *** atomotic has quit IRC (Quit: atomotic) [10:32] *** Mateon1 has quit IRC (Read error: Operation timed out) [10:32] *** Mateon1 has joined #internetarchive.bak [10:58] *** atomotic has joined #internetarchive.bak [12:34] *** atomotic has quit IRC (Quit: atomotic) [13:02] *** atomotic has joined #internetarchive.bak [13:13] *** AsmoB has joined #internetarchive.bak [13:49] *** atomotic has quit IRC (Quit: atomotic) [14:04] *** atomotic has joined #internetarchive.bak [16:10] *** atomotic has quit IRC (Quit: atomotic) [17:42] *** atomotic has joined #internetarchive.bak [17:58] *** atomotic has quit IRC (Quit: atomotic) [18:26] *** Pixi has quit IRC (Quit: Pixi) [18:27] *** Pixi has joined #internetarchive.bak [18:28] *** iabak-reg has joined #internetarchive.bak [18:53] *** shenghac has joined #internetarchive.bak [18:53] *** shenghac has quit IRC (Client Quit) [18:55] *** hc has joined #internetarchive.bak [18:58] *** shenghac has joined #internetarchive.bak [18:59] https://www.irccloud.com/pastebin/mkn7oNEz/ [18:59] Introduce myself: [18:59] I am Yu-Sheng Su, a computer science graduate school student at National Chengchi University (Taiwan). I do my research about network embedding and visual caption in Computational Linguistics and Information Processing Laboratory. Besides, I was an R&D intern at Microsoft and a machine learning intern in TradingValley (a startup company). Therefore, I am familiar with sklearn, tensorflow, and keras. [19:00] Project Question: [19:00] I have great interests in [Idea 3 Detect “soft 404s” and “parked” websites]. After I studied on it this week, I have few questions below: [19:00] 1. Will Internet Archive offer “soft 404s” and “parked” websites datasets? label data? or not? [19:00] -If the data was labeled, I can follow this paper [Identifying "Soft 404" Error Pages: Analyzing the Lexical Signatures of Documents in Distributed Collections] to meet [19:00] precision: 99% and recall of 92% (or higher) [19:00] -If there is no labeled data, I may choose unsupervised or NN model to do it. [19:00] 2.Final result: [19:00] - After the Idea 3 is finished, this system will be merged to wayback-machine-chrome, wayback-machine-firefox, or wayback-machine-safari?? If it will, I need to consider more when I build this system. [19:00] 3.Mentor: [19:00] Who will be this project Mentor? I found this project (Kenji Nagahashi) in internetarchive. Will Kenji Nagahashi be a mentor in this project. If he will , how can I connect with him to ask more in detail? [19:01] ====================== [19:01] Look forward to your reply, it will be a great help for me. [19:01] Big thanks!! [19:03] *** hc has quit IRC (Quit: Page closed) [21:22] *** AsmoB has quit IRC ((null))