[00:38] *** Stilett0 has quit IRC () [00:45] *** Stiletto has joined #archiveteam-ot [00:58] *** adinbied has quit IRC (Read error: Operation timed out) [00:58] *** adinbied has joined #archiveteam-ot [01:14] *** terorie has joined #archiveteam-ot [01:20] *** terorie has quit IRC (Remote host closed the connection) [01:21] *** terorie has joined #archiveteam-ot [01:25] *** wp494 has quit IRC (Ping timeout: 255 seconds) [01:25] *** wp494 has joined #archiveteam-ot [01:26] *** svchfoo3 sets mode: +o wp494 [01:51] Just got into the IIPC Slack [02:18] *** terorie_ has joined #archiveteam-ot [02:19] *** terorie has quit IRC (Read error: Operation timed out) [02:44] is there anyone that will run tubeup on random videos I find on my travels to the real obscure parts of the web [02:44] Ivan has youtube covered but sometimes i find videos from other sites [02:44] not all of them are up still [03:20] *** terorie_ has quit IRC (Remote host closed the connection) [03:24] Who of you guys is e30e? :-) [03:24] ArchiveBot got a mention in https://linuxwit.ch/blog/2018/12/everything-that-lives-is-designed-to-end/ [03:24] JAA: i am [03:24] well, my friend [03:24] but i'm a maintainer [03:26] Ah :-) [03:29] *** Aoede has quit IRC (Ping timeout: 186 seconds) [03:30] *** VoynichCr has quit IRC (Read error: Operation timed out) [03:37] *** Aoede has joined #archiveteam-ot [03:38] *** svchfoo3 sets mode: +o Aoede [03:40] *** VoynichCr has joined #archiveteam-ot [03:50] *** terorie has joined #archiveteam-ot [03:54] *** terorie has quit IRC (Read error: Operation timed out) [04:15] *** odemg has quit IRC (Ping timeout: 265 seconds) [04:22] *** terorie has joined #archiveteam-ot [04:27] *** odemg has joined #archiveteam-ot [05:02] *** DarkWorld has quit IRC (Read error: Connection reset by peer) [05:03] *** DarkWorld has joined #archiveteam-ot [06:00] Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen? [06:07] *** DarkWorld has quit IRC (Read error: Operation timed out) [06:40] *** terorie has quit IRC (Remote host closed the connection) [06:43] *** schbirid has joined #archiveteam-ot [06:48] *** terorie has joined #archiveteam-ot [06:58] So does anyone want to tubeup some random items? [06:58] https://watch-learn.com/video-tutorials/basic-frontend-tools-ssh-scp-sftp-and-git-ftp [06:58] thats an example of something I might find [07:00] *** anarcat has quit IRC (Ping timeout: 265 seconds) [07:09] *** terorie has quit IRC (Remote host closed the connection) [07:10] *** terorie has joined #archiveteam-ot [07:19] *** Jusque has quit IRC (Quit: ZNC - http://znc.in) [07:20] *** Jusque has joined #archiveteam-ot [08:04] *** DarkWorld has joined #archiveteam-ot [08:07] *** terorie has quit IRC (Remote host closed the connection) [08:07] *** terorie has joined #archiveteam-ot [08:10] *** terorie has quit IRC (Read error: Operation timed out) [08:12] *** terorie has joined #archiveteam-ot [08:19] *** terorie has quit IRC (Remote host closed the connection) [08:19] *** terorie has joined #archiveteam-ot [08:24] *** terorie has quit IRC (Ping timeout: 268 seconds) [08:39] *** terorie has joined #archiveteam-ot [09:11] *** VerifiedJ has joined #archiveteam-ot [09:13] *** Verified_ has quit IRC (Ping timeout: 252 seconds) [09:24] *** DarkWorld has quit IRC (Read error: Connection reset by peer) [09:24] *** DarkWorld has joined #archiveteam-ot [09:48] *** BlueMax has quit IRC (Quit: Leaving) [10:21] *** adinbied has quit IRC (Read error: Connection reset by peer) [10:22] *** adinbied has joined #archiveteam-ot [10:28] *** wp494 has quit IRC (Ping timeout: 506 seconds) [10:28] *** wp494 has joined #archiveteam-ot [10:29] *** svchfoo3 sets mode: +o wp494 [12:10] *** caff has quit IRC (Read error: Connection reset by peer) [12:52] *** anarcat has joined #archiveteam-ot [12:57] *** DarkWorld has quit IRC (Ping timeout: 600 seconds) [12:57] *** DarkWorld has joined #archiveteam-ot [13:37] *** terorie has quit IRC (Remote host closed the connection) [13:37] *** terorie has joined #archiveteam-ot [13:41] *** terorie has quit IRC (Read error: Operation timed out) [13:50] *** terorie has joined #archiveteam-ot [14:20] *** terorie has quit IRC (Remote host closed the connection) [14:20] *** terorie has joined #archiveteam-ot [14:21] *** DarkWorld has quit IRC (Ping timeout: 633 seconds) [14:38] *** terorie has quit IRC (Remote host closed the connection) [14:41] *** terorie has joined #archiveteam-ot [15:18] IIPC is ... interesting. [15:19] "Gopher is not mentioned by the standard and I'm not aware of any existing guidance or tools for archiving gopher as WARC files. If someone does come up with a concrete proposal it'd be great to file it as an issue against https://github.com/iipc/warc-specifications for consideration for inclusion in future revisions of the standard. Even if it's not mature enough or there's not enough support to get it into the ISO standard [15:19] proper it'd still be good to document an approach on the warc specifications website/github so others interested in archiving gopher can follow suit." [15:20] Interesting. I always thought their policy was "implementation first, please". [15:24] Out of curiosity, what would happen if someone tried to just use a WARC writing proxy? [15:26] I'm guessing probably nothing? [15:26] Well, the proxy would have to support Gopher for that to work, in which case there'd be an implementation of a WARC-writing Gopher tool. [15:27] Since WARCs don't store raw network dumps but slightly interpreted data (request/response pairs), there is no way to implement a generic WARC-writing proxy. [15:28] You need to add support for each protocol individually. [15:28] Which was one of the reasons why PurpleSym went with that high-level abstraction of the network data in crocoite/chromebot. [15:29] The downside is that the WARCs don't contain the raw request/response data as sent over the network. The advantage is that it also supports HTTP/2, WebSocket, etc. [15:30] ah [15:37] Apparantly ARC has support for GOPHER. https://usercontent.irccloud-cdn.com/file/PjVhJaaO/image.png [15:39] I'm assuming that implies there was at some point something that crawled gopher sites and recorded them in an ARC file. [15:51] *** terorie has quit IRC (Remote host closed the connection) [16:19] *** terorie has joined #archiveteam-ot [16:34] *** terorie has quit IRC (Remote host closed the connection) [16:34] *** Verified_ has joined #archiveteam-ot [16:36] *** VerifiedJ has quit IRC (Ping timeout: 252 seconds) [16:40] *** terorie has joined #archiveteam-ot [16:46] *** terorie has quit IRC (Remote host closed the connection) [17:48] *** uhhh has joined #archiveteam-ot [17:51] *** benjinsmi has quit IRC (Leaving) [17:52] *** benjins has joined #archiveteam-ot [18:11] *** uhhh has quit IRC (Ping timeout: 252 seconds) [18:28] *** jesso_ has joined #archiveteam-ot [18:36] *** nbneer has joined #archiveteam-ot [18:40] *** nbneer has quit IRC (Ping timeout: 265 seconds) [18:47] *** terorie has joined #archiveteam-ot [18:50] *** uhhh has joined #archiveteam-ot [18:50] *** uhhh has left [18:51] *** terorie has quit IRC (Read error: Operation timed out) [19:09] *** jut has quit IRC (Quit: Ram upgrade has arrived. Woot!!) [19:21] *** caff has joined #archiveteam-ot [19:25] *** wp494 has quit IRC (Ping timeout: 255 seconds) [19:26] *** wp494 has joined #archiveteam-ot [19:26] *** svchfoo1 sets mode: +o wp494 [19:32] *** terorie has joined #archiveteam-ot [20:07] *** jut has joined #archiveteam-ot [20:14] *** keith20 has joined #archiveteam-ot [20:16] *** keith20 has quit IRC (Remote host closed the connection) [20:16] *** keith20 has joined #archiveteam-ot [20:19] *** keith20 has quit IRC (Client Quit) [20:33] *** terorie has quit IRC (Remote host closed the connection) [20:56] *** Mateon1 has quit IRC (Ping timeout: 265 seconds) [20:56] *** Mateon1 has joined #archiveteam-ot [21:26] *** BlueMax has joined #archiveteam-ot [21:27] *** VerifiedJ has joined #archiveteam-ot [21:29] *** Verified_ has quit IRC (Ping timeout: 252 seconds) [22:33] *** terorie has joined #archiveteam-ot [22:37] *** terorie has quit IRC (Read error: Operation timed out) [22:39] *** t2t2 has quit IRC (Read error: Operation timed out) [22:39] *** t2t2 has joined #archiveteam-ot [22:45] *** DarkWorld has joined #archiveteam-ot [22:58] *** tuluu has quit IRC (Remote host closed the connection) [23:02] *** tuluu has joined #archiveteam-ot [23:16] *** DarkWorld has quit IRC (Ping timeout: 600 seconds) [23:17] *** DarkWorld has joined #archiveteam-ot [23:24] *** Fusl has joined #archiveteam-ot [23:24] *** nyphryx has joined #archiveteam-ot [23:25] *** diggan has joined #archiveteam-ot [23:25] So... [23:25] So. [23:26] Do you think BOINC will be useful? [23:26] BOINC? Protein folding? [23:26] Let me Google.. [23:26] ........... [23:26] BOINC as in distributed computing [23:26] Yes. [23:26] It's a software that allows easy distribution of computing. [23:27] It's also simple to install and run. [23:27] It also crashes my computer faster than the warrior [23:27] The problem is storage. Concerning the Tumblr issue I did not scrollback, old CISO forgot IRC-tech related commands, [23:28] In any case, the storage is a problem in a sense that do you really need to have porn unencrypted on your laptop? [23:28] You encrypt your porn? [23:28] What kind of weird fetishes do you have that you insist on encrypting it? [23:28] Storage? Don't worry about that. That's not in our control. We're uploading it to the Archive Team servers and to the Internet Archive. [23:29] We can handle the encryption in the script. [23:30] We can have different types of jobs too. Some that need encryption, and others that don't. [23:30] Tumblr NSFW storage. I'm not talking about Warrior WARC tech...and guys, take it easy on new comers to the team, some bring value. [23:30] The storage is handled by IA [23:30] its out of our hands [23:30] That's what I said. [23:30] We download it and upload it to FOS aka the Fortress of Solitude [23:31] Alright. [23:31] A staging server owned by our illustrious leader [23:31] He then pushes the data to the archive [23:31] The data doesnt stay on your computer for long [23:32] So we need a better distribution and tracking method. And then we need optimized code. [23:32] Thing is we are all volunteers [23:32] nobody gets paid for this [23:32] I really think BOINC will help. [23:32] I disagree [23:33] What I'm pointing out is that the data on your computer might be data that belongs to somebody else, and the PartyVan might show up while the data is for 2 seconds on your computer because the judge and prosecutor? Don't care. [23:33] Flashfire: Why do you disagree? [23:33] if you are worried then dont run the project [23:33] Its the risk we all take [23:33] BOINC has completly different ideologies [23:34] I think I'll git branch the project at some point since geocities.yahoo.com is gone, now geocities.jp is gone, and now XX% of tumblr.com will be gone. [23:34] nyphryx: Don't worry about the encryption part yet. That can easily be taken care of. The biggest issue is optimization and deployment of the code. [23:34] BOINC sure, but how to crawl a topic that's being deleted from a website with blogs [23:35] Too much stuff that needs to be done tracker side for it to work for BOINC [23:35] As in, "NSFW blogs on Tumblr in a universe where Tumblr does not put the NSFW tag On" [23:35] *** kpcyrd has joined #archiveteam-ot [23:35] Plus Boinc has massive problems with deduplication [23:35] The same job goes out to 5-10 different people [23:36] Flashfire: You can change that. [23:36] Then we have the issue of claim releasing [23:36] I don't think you know enough about BOINC. [23:36] I dont think I know enough to be commenting on this discussion at all but I still think its a shit idea [23:37] (personally do not know much about BOINC but i will take a look) [23:37] BOINC isn't actually doing anything except making it easier to distribute jobs. [23:37] Relative of Hadoop*? [23:37] What problem does BOINC solve that we currently have? [23:38] I.e. what would we gain by switching? [23:38] Deployment to a larger audience. [23:38] I have a feeling that it'd be a lot of work for probably very little gain. [23:41] teej_, are you around the channel? [23:41] If we add the project to a BOINC tracking site for people to join, it can allow us to get many more individuals involved. That can increase the net rate at which jobs are finished. [23:41] nyphryx: What do you mean? [23:42] JAA: ^ [23:45] Many universities use BOINC as well. And universities generally have good internet connections. [23:46] Doesn't mean they're interested in downloading NSFW gifs from Tumblr though. [23:47] As far as I know, BOINC is all about distributed computing, not distributed downloads. [23:48] Good point. [23:59] Okay. Another thing is to we need to look into is a more optimized way of downloading and creating warcs.