#archiveteam-bs 2019-04-15,Mon

↑back Search

Time Nickname Message
00:00 🔗 marked kisspunch: 3x improvement in what? drive space needed?
00:41 🔗 kisspunch marked: right
00:57 🔗 marked are you using http, warc, and/or git? previously I would sort through warc from a crawl and make a pie chart of byte count per URL
00:58 🔗 marked I'll take a look if your code is somewhere I can run it, or have an output for DL I can comb through
01:11 🔗 kisspunch marked: I don't think I can reduce the byte count per repo further. I need to cut out repos now.
01:12 🔗 kisspunch I can PM some access stuff if you want to comb through things
01:16 🔗 ndiddy has quit IRC ()
01:42 🔗 t3 JAA: Can you check if co8mqvzufi1dsn7jnfbcn1o3o is stuck for me (on ArchiveBot)? I want to the job to finish so that more jobs can be added.
01:43 🔗 JAA t3: Wrong channel.
01:44 🔗 t3 JAA: I am sorry. I'll just message you personally next time.
01:44 🔗 JAA No, the correct place is #archivebot.
02:13 🔗 ivan has quit IRC (Read error: Operation timed out)
02:13 🔗 JAA has quit IRC (Read error: Operation timed out)
02:13 🔗 cfarquhar has quit IRC (Read error: Operation timed out)
02:14 🔗 ivan has joined #archiveteam-bs
02:14 🔗 wabu has quit IRC (Read error: Operation timed out)
02:14 🔗 svchfoo1 has quit IRC (Read error: Operation timed out)
02:14 🔗 fuzy802 has joined #archiveteam-bs
02:15 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
02:15 🔗 nightpoo- has quit IRC (Read error: Operation timed out)
02:16 🔗 c4rc4s has quit IRC (Read error: Operation timed out)
02:16 🔗 simon816 has quit IRC (Ping timeout: 246 seconds)
02:24 🔗 nightpool has joined #archiveteam-bs
02:24 🔗 fuzy802 is now known as fuzzy8021
02:27 🔗 dumbass_ has quit IRC (Ping timeout: 260 seconds)
02:53 🔗 Rome_Silv has quit IRC (Remote host closed the connection)
02:53 🔗 Rome_Silv has joined #archiveteam-bs
03:02 🔗 Mateon1 has quit IRC (Ping timeout: 265 seconds)
03:03 🔗 Mateon1 has joined #archiveteam-bs
03:14 🔗 cfarquhar has joined #archiveteam-bs
03:14 🔗 c4rc4s has joined #archiveteam-bs
03:15 🔗 simon816 has joined #archiveteam-bs
03:18 🔗 svchfoo1 has joined #archiveteam-bs
03:18 🔗 Fusl sets mode: +o svchfoo1
03:18 🔗 JAA has joined #archiveteam-bs
03:18 🔗 Fusl sets mode: +o JAA
03:18 🔗 bakJAA sets mode: +o JAA
03:18 🔗 wabu has joined #archiveteam-bs
03:26 🔗 qw3rty119 has joined #archiveteam-bs
03:30 🔗 Rome_Silv has quit IRC (Remote host closed the connection)
03:30 🔗 Rome_Silv has joined #archiveteam-bs
03:32 🔗 qw3rty118 has quit IRC (Read error: Operation timed out)
03:32 🔗 odemgi_ has joined #archiveteam-bs
03:35 🔗 odemgi has quit IRC (Read error: Operation timed out)
03:35 🔗 Rome has joined #archiveteam-bs
03:38 🔗 Rome_Silv has quit IRC (Ping timeout: 252 seconds)
03:41 🔗 odemg has quit IRC (Ping timeout: 615 seconds)
03:44 🔗 Lord_Nigh ftp sites are not going to archive themselves, and more and more sites from the ftp list are dead each day. something has to be done
03:47 🔗 odemg has joined #archiveteam-bs
03:49 🔗 Flashfire I completly agree
03:49 🔗 Flashfire I cant do anything about it with no coding knowledge but last year I built up a substantial list of the FTP sites
03:50 🔗 Flashfire Lord_Nigh as far as I know the scripts are outdated
03:52 🔗 Flashfire Lord_Nigh there is #effteepee but its been dead for months
03:55 🔗 Lord_Nigh nothing will get done until someone either learns how to script it, or has an alternative, better solution
03:55 🔗 Lord_Nigh we can gripe about the dead effteepee project all day but it achieves nothing
03:56 🔗 marked do FTP sites go into WBM somehow?
03:56 🔗 Flashfire Sketchcow was telling me once about a guy that was uploading FTP sites as zips but I dont remember what became of that
03:56 🔗 Lord_Nigh I don't have the skill to do it
03:56 🔗 Flashfire FTP sites dont go to the wayback machine it cant handle the protocol
03:56 🔗 Lord_Nigh beyond simple 'wget -m -np -p ftp://ftp.site.com/'
03:57 🔗 Flashfire I mean id happily donate time but dont have skill or money to do anything better
03:57 🔗 Flashfire We can check if the ftp site has a http site equivalent and archive that
03:58 🔗 Flashfire but that is a bandaid over a gunshot wound
03:58 🔗 omarroth has quit IRC (Remote host closed the connection)
04:02 🔗 astrid i've been uploading ftp sites as zips for a while
04:02 🔗 astrid i'm not very active on it though
04:03 🔗 marked ftp is easier than then http equivalent. what happened to the ftp grab that ran on the tracker?
04:04 🔗 astrid i didn't know there was such a thing on the tracker ... ?
04:05 🔗 marked all I know is what's in the wiki https://www.archiveteam.org/index.php?title=FTP
04:05 🔗 Flashfire There was but that was before my time
04:06 🔗 Flashfire All i know is the scripts are broken and the project was all but abandoned I spent a lot of 2018 working through and adding stuff the FTP/List
04:11 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
05:05 🔗 stapler11 has joined #archiveteam-bs
05:32 🔗 enowaldo has joined #archiveteam-bs
05:37 🔗 Zerote has joined #archiveteam-bs
05:41 🔗 enowaldo has quit IRC (Read error: Operation timed out)
07:06 🔗 Zerote has quit IRC (Ping timeout: 260 seconds)
07:20 🔗 Zerote has joined #archiveteam-bs
09:09 🔗 JAA Sola CDN grab is nearly done, just a few 10k URLs remaining.
09:36 🔗 stapler11 has quit IRC (Read error: Connection reset by peer)
09:37 🔗 stapler11 has joined #archiveteam-bs
09:47 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
09:56 🔗 Verified_ has quit IRC (Ping timeout: 252 seconds)
09:59 🔗 Verified_ has joined #archiveteam-bs
10:19 🔗 Mateon1 has quit IRC (Quit: Mateon1)
10:23 🔗 Mateon1 has joined #archiveteam-bs
10:27 🔗 Verified_ has quit IRC (Ping timeout: 252 seconds)
10:31 🔗 Mateon1 has quit IRC (Remote host closed the connection)
10:39 🔗 Mateon1 has joined #archiveteam-bs
10:44 🔗 Verified_ has joined #archiveteam-bs
11:17 🔗 enowaldo has joined #archiveteam-bs
11:32 🔗 JAA Sola CDN complete. I don't have any numbers for the total size yet, but it's over a TiB.
11:35 🔗 enowaldo has quit IRC (Read error: Operation timed out)
11:54 🔗 JAA HCross: I just saw that https://archiveteam.org/index.php?title=ZAM_Network still mentions Storm Shield One (stormshield.one) as "In progress... by User:HCross" since September. I can't find a grab from around that time in the WBM. Any idea what happened to that?
12:05 🔗 enowaldo has joined #archiveteam-bs
12:06 🔗 icedice has joined #archiveteam-bs
12:23 🔗 enowaldo has quit IRC (Read error: Operation timed out)
13:11 🔗 enowaldo has joined #archiveteam-bs
13:19 🔗 chirlu has quit IRC (Ping timeout: 255 seconds)
13:32 🔗 Despatche has joined #archiveteam-bs
13:38 🔗 chirlu has joined #archiveteam-bs
13:44 🔗 omarroth has joined #archiveteam-bs
14:22 🔗 dumbass_ has joined #archiveteam-bs
14:25 🔗 Zerote has quit IRC (Ping timeout: 260 seconds)
14:34 🔗 enowaldo has quit IRC (Read error: Operation timed out)
14:52 🔗 balrog has quit IRC (Quit: Bye)
15:00 🔗 kiska1 has quit IRC (Read error: Operation timed out)
15:01 🔗 kiska1 has joined #archiveteam-bs
15:01 🔗 enowaldo has joined #archiveteam-bs
15:01 🔗 svchfoo3 sets mode: +o kiska1
15:03 🔗 enowaldo has quit IRC (Read error: Operation timed out)
15:09 🔗 balrog has joined #archiveteam-bs
15:10 🔗 dumbass_ has quit IRC (Ping timeout: 260 seconds)
15:16 🔗 omarroth has quit IRC (Quit: Konversation terminated!)
15:16 🔗 Zerote has joined #archiveteam-bs
15:17 🔗 omarroth has joined #archiveteam-bs
15:18 🔗 DogsRNice has joined #archiveteam-bs
15:22 🔗 DogsRNice Hi. I was wondering about the feasibility of archiving the steam community (profiles and associated pages, groups and their decisions and everything else there). The coverage of it is almost nonexistent on the wayback machine beyond the surface level. While the site is likely stable in the long term there will likely be a UI update at some point in the future that may make archiving difficult
15:23 🔗 JAA It already is difficult. Pagination of comments works through JS, for example.
15:23 🔗 JAA Or at least it did last time I checked.
15:24 🔗 DogsRNice The "view all comments" button leads to a normal list of pages that can be saved but not all types of pages have that
15:31 🔗 DogsRNice has quit IRC (Ping timeout: 263 seconds)
15:36 🔗 RomeSilva has joined #archiveteam-bs
15:37 🔗 Rome has quit IRC (Ping timeout: 252 seconds)
15:40 🔗 enowaldo has joined #archiveteam-bs
15:47 🔗 JAA Total size of my Sola CDN grab: 1.01 TiB. Technically "over a TiB". :-)
15:48 🔗 enowaldo has quit IRC (Ping timeout: 265 seconds)
15:48 🔗 JAA The data is in these items: https://archive.org/details/@justanotherarchivist?and%5B%5D=sola.ai_cdn_201904_
16:19 🔗 JAA kiska: I've added details about my part of the Sola crawl to https://archiveteam.org/index.php?title=Sola.ai but I'm not sure how much the warrior project managed, where the data is, etc.
16:20 🔗 kiska JAA: The data that we managed to grab is here: https://archive.org/details/archiveteam_sola_20190412062205_780e9c17
16:21 🔗 kiska As far as I know there is about 11.7G of data there
16:21 🔗 JAA Any idea how many users we managed to cover?
16:22 🔗 kiska I'll check the json and see how much we got, but I am going to estimate <=1k
16:22 🔗 enowaldo has joined #archiveteam-bs
16:24 🔗 kiska zcat sola_20190412062205_780e9c17.megawarc.json.gz | wc -l indicates 307 lines
16:26 🔗 JAA Thanks, adding that to the page.
16:30 🔗 kiska I knew we wouldn't grab much since it was set up in a hurry and my tracker was the most unstable piece of software I have every touched
16:36 🔗 t3 has quit IRC (Quit: Connection closed for inactivity)
16:36 🔗 bitBaron has quit IRC (Read error: Operation timed out)
16:37 🔗 enowaldo has quit IRC (Read error: Operation timed out)
16:43 🔗 jspiros__ has joined #archiveteam-bs
16:43 🔗 enowaldo has joined #archiveteam-bs
17:35 🔗 omarroth has quit IRC (Ping timeout: 506 seconds)
17:35 🔗 omarroth has joined #archiveteam-bs
17:56 🔗 tephra has quit IRC (Read error: Operation timed out)
18:00 🔗 tephra has joined #archiveteam-bs
18:11 🔗 Oddly has joined #archiveteam-bs
18:24 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
18:32 🔗 enowaldo has joined #archiveteam-bs
19:31 🔗 Mateon1 has quit IRC (Remote host closed the connection)
19:32 🔗 Mateon1 has joined #archiveteam-bs
19:46 🔗 Stiletto has joined #archiveteam-bs
19:54 🔗 enowaldo has quit IRC (Read error: Operation timed out)
19:58 🔗 omarroth has quit IRC (Read error: Connection reset by peer)
19:59 🔗 omarroth has joined #archiveteam-bs
20:23 🔗 Stilett0- has joined #archiveteam-bs
20:23 🔗 VADemon_ has quit IRC (Read error: Connection reset by peer)
20:25 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
20:25 🔗 enowaldo has joined #archiveteam-bs
20:29 🔗 fredgido has joined #archiveteam-bs
20:31 🔗 enowaldo has quit IRC (Read error: Operation timed out)
20:37 🔗 Stilett0- is now known as Stiletto
20:40 🔗 Stilett0- has joined #archiveteam-bs
20:43 🔗 Stilettoo has joined #archiveteam-bs
20:45 🔗 Stiletto has quit IRC (Read error: Operation timed out)
20:47 🔗 Stilett0- has quit IRC (Ping timeout: 265 seconds)
20:49 🔗 Stilettoo is now known as Stiletto
20:52 🔗 alex_ has joined #archiveteam-bs
21:06 🔗 schbirid has quit IRC (Remote host closed the connection)
21:47 🔗 VerifiedJ has joined #archiveteam-bs
21:51 🔗 Verified_ has quit IRC (Ping timeout: 252 seconds)
21:52 🔗 enowaldo has joined #archiveteam-bs
21:54 🔗 Verified_ has joined #archiveteam-bs
21:56 🔗 VerifiedJ has quit IRC (Ping timeout: 252 seconds)
21:58 🔗 Stilettoo has joined #archiveteam-bs
21:58 🔗 stapler11 has quit IRC (Read error: Connection reset by peer)
21:59 🔗 stapler11 has joined #archiveteam-bs
21:59 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
22:02 🔗 icedice has quit IRC (Read error: Connection reset by peer)
22:02 🔗 icedice2 has joined #archiveteam-bs
22:03 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
22:05 🔗 VerifiedJ has joined #archiveteam-bs
22:05 🔗 Verified_ has quit IRC (Ping timeout: 252 seconds)
22:18 🔗 VerifiedJ has quit IRC (Ping timeout: 252 seconds)
22:25 🔗 Verified_ has joined #archiveteam-bs
22:44 🔗 enowaldo has joined #archiveteam-bs
22:56 🔗 Stilettoo is now known as Stiletto
22:59 🔗 enowaldo has quit IRC (Read error: Operation timed out)
23:06 🔗 icedice2 has quit IRC (Quit: Leaving)
23:22 🔗 BlueMax has joined #archiveteam-bs
23:33 🔗 alex_ has quit IRC (Quit: take care ye all. Have fun!)
23:35 🔗 dashcloud has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
23:36 🔗 dashcloud has joined #archiveteam-bs
23:38 🔗 qw3rty119 has quit IRC (Read error: Connection reset by peer)
23:45 🔗 qw3rty119 has joined #archiveteam-bs
23:49 🔗 ndiddy has joined #archiveteam-bs
23:58 🔗 Ravenloft has joined #archiveteam-bs

irclogger-viewer