#internetarchive.bak 2018-02-22,Thu

↑back Search

Time Nickname Message
01:44 πŸ”— jacketcha has quit IRC (Ping timeout: 252 seconds)
02:20 πŸ”— trs80 has quit IRC (Ping timeout: 246 seconds)
03:25 πŸ”— Mateon1 has quit IRC (Remote host closed the connection)
03:26 πŸ”— Mateon1 has joined #internetarchive.bak
06:56 πŸ”— trs80 has joined #internetarchive.bak
07:45 πŸ”— trs80 has quit IRC (Ping timeout: 246 seconds)
07:45 πŸ”— trs80 has joined #internetarchive.bak
08:18 πŸ”— rrn has joined #internetarchive.bak
08:24 πŸ”— atomotic has joined #internetarchive.bak
10:11 πŸ”— atomotic has quit IRC (Quit: atomotic)
10:32 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
10:32 πŸ”— Mateon1 has joined #internetarchive.bak
10:58 πŸ”— atomotic has joined #internetarchive.bak
12:34 πŸ”— atomotic has quit IRC (Quit: atomotic)
13:02 πŸ”— atomotic has joined #internetarchive.bak
13:13 πŸ”— AsmoB has joined #internetarchive.bak
13:49 πŸ”— atomotic has quit IRC (Quit: atomotic)
14:04 πŸ”— atomotic has joined #internetarchive.bak
16:10 πŸ”— atomotic has quit IRC (Quit: atomotic)
17:42 πŸ”— atomotic has joined #internetarchive.bak
17:58 πŸ”— atomotic has quit IRC (Quit: atomotic)
18:26 πŸ”— Pixi has quit IRC (Quit: Pixi)
18:27 πŸ”— Pixi has joined #internetarchive.bak
18:28 πŸ”— iabak-reg has joined #internetarchive.bak
18:53 πŸ”— shenghac has joined #internetarchive.bak
18:53 πŸ”— shenghac has quit IRC (Client Quit)
18:55 πŸ”— hc has joined #internetarchive.bak
18:58 πŸ”— shenghac has joined #internetarchive.bak
18:59 πŸ”— shenghac https://www.irccloud.com/pastebin/mkn7oNEz/
18:59 πŸ”— shenghac Introduce myself:
18:59 πŸ”— shenghac I am Yu-Sheng Su, a computer science graduate school student at National Chengchi University (Taiwan). I do my research about network embedding and visual caption in Computational Linguistics and Information Processing Laboratory. Besides, I was an R&D intern at Microsoft and a machine learning intern in TradingValley (a startup company). Therefore, I am familiar with sklearn, tensorflow, and keras.
19:00 πŸ”— shenghac Project Question:
19:00 πŸ”— shenghac I have great interests in [Idea 3 Detect β€œsoft 404s” and β€œparked” websites]. After I studied on it this week, I have few questions below:
19:00 πŸ”— shenghac 1. Will Internet Archive offer β€œsoft 404s” and β€œparked” websites datasets? label data? or not?
19:00 πŸ”— shenghac -If the data was labeled, I can follow this paper [Identifying "Soft 404" Error Pages: Analyzing the Lexical Signatures of Documents in Distributed Collections] to meet
19:00 πŸ”— shenghac precision: 99% and recall of 92% (or higher)
19:00 πŸ”— shenghac -If there is no labeled data, I may choose unsupervised or NN model to do it.
19:00 πŸ”— shenghac 2.Final result:
19:00 πŸ”— shenghac - After the Idea 3 is finished, this system will be merged to wayback-machine-chrome, wayback-machine-firefox, or wayback-machine-safari?? If it will, I need to consider more when I build this system.
19:00 πŸ”— shenghac 3.Mentor:
19:00 πŸ”— shenghac Who will be this project Mentor? I found this project (Kenji Nagahashi) in internetarchive. Will Kenji Nagahashi be a mentor in this project. If he will , how can I connect with him to ask more in detail?
19:01 πŸ”— shenghac ======================
19:01 πŸ”— shenghac Look forward to your reply, it will be a great help for me.
19:01 πŸ”— shenghac Big thanks!!
19:03 πŸ”— hc has quit IRC (Quit: Page closed)
21:22 πŸ”— AsmoB has quit IRC ((null))

irclogger-viewer