[00:13] *** Stil3tt0 is now known as Stiletto [00:18] it happens SchroSct [00:55] *** nickware has joined #archiveteam [01:00] *** __sagitai has joined #archiveteam [01:05] *** _sagitair has quit IRC (Ping timeout: 370 seconds) [01:09] *** matthusby has quit IRC (ZNC 1.6.3+deb2 - http://znc.in) [01:15] *** db420 is now known as dboard [01:16] *** icedice2 has joined #archiveteam [01:17] *** icedice has quit IRC (Read error: Connection reset by peer) [01:20] *** rocode_ has joined #archiveteam [01:27] *** rocode has quit IRC (Ping timeout: 246 seconds) [01:27] *** rocode_ is now known as rocode [01:30] *** kristian_ has quit IRC (Quit: Leaving) [01:30] *** QBcrusher has quit IRC (Ping timeout: 506 seconds) [01:30] *** QBcrusher has joined #archiveteam [01:46] *** spiko has joined #archiveteam [01:48] admins? FAQ said to notify you if I made weird stuff so you can "clear my claims" ? [02:29] *** ndiddy has quit IRC (Read error: Connection reset by peer) [02:41] *** _sagitair has joined #archiveteam [02:42] spiko: which project? [02:43] SchroSct: yeah, it happens, and people are pretty much always willing to discuss the possibility if there's benevolent intent (ie. it's not just a stalling tactic to prevent archival) [02:47] *** __sagitai has quit IRC (Ping timeout: 370 seconds) [02:47] right, I would worry of that too >_> [02:47] I've been hurt before ;_; [02:48] SchroSct: worth a read: http://www.archiveteam.org/index.php?title=Posterous/Story [02:49] what made me think about it was the huge difference between my downloads and uploads? so if that's from compression or etc. then makes me think it would be a lot easier if they just layed down and rsync'd [02:49] SchroSct: well, there's a certain overhead that comes with scraping over HTTP, but also the benefit that it can go straight into the wayback with all metadata [02:49] SchroSct: the difference you are seeing is likely compression [02:50] probably a mostly text-based project, and text compresses well [02:50] SchroSct: anyhow, more extensive discussion should go into #archiveteam-bs [02:50] :P [02:55] oh, k [02:55] joepie91: archive warrior. was clicking in ui too fast and shutting down vm. Know I clicked on URLTeam 2 and ftp discovery and wiki team. But jusdging by vm output currently its working on IMDB. oh and i also clicked "archive team choice" [02:56] UI messed up now, nocurrent project page is blank :) [02:58] *** schbirid2 has joined #archiveteam [02:59] spiko: ah, right. then you'll want to hop into #imdbgone and ask there :) [02:59] actually [02:59] nevermind that, hold on [03:00] yeah, I have admin access to that tracker [03:00] spiko: what username? [03:01] joepie91: Spiko with capital S [03:01] should i stop the project on my end too? [03:01] spiko: hmm. I see no open claims on imdb -- you said it's still running? [03:02] well in UI I see some UL/DL and the VM is still running tho there seems to be no new console output [03:02] weird. you're *sure* that your nickname is Spiko in the warrior? [03:02] that's what it says in the Settings page [03:03] spiko: you'll probably want to restart the VM, I'm guessing something is broken [03:03] *** schbirid has quit IRC (Read error: Operation timed out) [03:03] might be i joined imdb through "archive team choice" [03:03] even then it should show up [03:03] ok [03:03] anyhow, i see no outstanding claims on wikiteam or FTP disco - I don't know about urlteam, that's a separate tracker that I don't think I have access to [03:03] you'll want to drop into #urlteam to make sure [03:04] there are no registered claims on imdb for your name either, so I'm guessing something got lost along the way, and a restart will probably fix it [03:04] it fails closed (ie. if a warrior breaks a job it will just never report it done) so there shouldn't be any issues caused by your previous switches [03:05] can I turn my VM back on and select another project now? [03:05] spiko: yeah [03:05] but I'm not sure there's anything currently active besides imdb and urlteam, and imdb needs help the most [03:06] spiko: for future questions concerning imdb it's probably best to ask in #imdbgone since I'll be gone in a bit and this is meant to be a low-traffic channel :P [03:07] Restarted and seems it started IMDB by itself. Hopefully no problems now ;) Thanks [03:08] spiko: hmm. you're still not showing up on the imdb claims... [03:08] there's a claim from an `unknown` though [03:08] single one, claimed just now [03:08] aaand it's gone [03:08] hm, I'll nuke the VM and restart :D [03:09] yeah, just rebuild it [03:09] this is really rather odd [03:09] Sorry, I have a talent for messing up programs :) [03:10] it's kind of impressive though, half the point of the warrior being a VM is that it should make it effectively impossible to break no matter where you run it [03:10] :P [03:11] spiko: but yeah, let's move to #imdbgone because this is getting quite noisy heh [03:12] *** gui7 has joined #archiveteam [03:23] *** maelstrom has quit IRC (Remote host closed the connection) [03:26] joepie91, spiko, we're also grabbing yahoo answers and ipernity (rate limited) right now, and they're going much slower than IMDB. [03:29] *** odemg has joined #archiveteam [03:35] SN4T14: I'm about to go to sleep, can I stop IMDB and start "archive team choice" and it willl work on those? [03:37] *** pizzaiolo has left [03:43] spiko: yes, i believe the current choice is yahoo answers [03:44] should I stop the imdb project first? I'm afraid I'll break something again ^^ [03:45] !ig d0tjmwa6sewa9j265b1sqgiew ^https?://www\.therebel\.media/forms/flags/ [03:45] oops [03:47] that regex seems wrong :) [03:48] your claims there should also age out, but if you let it take its time to stop then it shouldnt be an issue [03:48] I started the "team choice" and it just started imdb aggain. that ok? [03:49] yep [03:49] ok, gtg sleep, have fun and thanks for help:) [03:50] toodleoo ~ [03:50] ah sry, that regex is fine, ? means something else in windows i tthink [03:51] *** spiko has quit IRC (Quit: Page closed) [03:52] *** blabdude_ has joined #archiveteam [04:21] *** tip has joined #archiveteam [04:22] *** gui7 has quit IRC (Quit: Ex-Chat) [05:08] *** VADemon has quit IRC (Quit: left4dead) [05:18] *** icedice2 has quit IRC (Quit: Leaving) [05:23] *** blabdude_ has quit IRC (Ping timeout: 268 seconds) [05:28] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:34] *** Sk1d has joined #archiveteam [05:42] *** User405 has joined #archiveteam [05:43] *** User404 has quit IRC (Read error: Connection reset by peer) [05:47] *** nickware has quit IRC (Quit: Leaving) [06:24] *** QBcrusher has quit IRC (Ping timeout: 506 seconds) [06:24] *** QBcrusher has joined #archiveteam [06:33] *** unkn0wn_ has quit IRC () [07:05] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:32] *** odemg has quit IRC (Remote host closed the connection) [07:33] *** odemg has joined #archiveteam [07:48] *** topdownji has quit IRC (Remote host closed the connection) [07:48] *** topdownji has joined #archiveteam [07:53] *** redlob has quit IRC (Quit: ZNC - http://znc.in) [08:01] *** redlob has joined #archiveteam [08:07] *** paparus has joined #archiveteam [08:08] I wanted to ask, is the group only interested in archiving stuff that faces imminent deletion or is the purpose general archiving also? [08:09] for example there are governmental databases of court documents that have no text search which if crawled could be put to better use [08:09] or a governmental site listing current property ownership but has no history of ownership changes. [08:10] ArchiveTeam is mostly about stuff facing general deletion. [08:11] But like, the code is all OSS. There's nothing really stopping you from setting up your own grabs for stuff like that. [08:11] *facing imminent deletion [08:12] yeah, I mean I can write my own crawler and run it via tor or something, the group is great because it has volunteers acting as multiple outgoing ips which helps get around rate limits and such [08:13] and also storage/backup is taken care of the archive project, which is nice [08:18] paparus: We do a lot of automated precautionary archiving in #archivebot [08:18] Everything is uploaded to archive.org and available in the wayback machine [08:19] dxrt, any specific interest in deep web sites? like where you have to do a search to get to the data? [08:19] like in my two examples [08:20] these require specialized crawlers, which looks similar to what you do [08:20] deep web as in tor stuff? [08:20] no [08:21] https://en.wikipedia.org/wiki/Deep_web [08:21] stuff that isn't crawled by regular crawlers [08:22] ah I always thought darkweb was synonymous with tor stuff [08:22] We've got a project related to government stuff atm [08:22] yes, but deep web is different [08:22] right [08:22] *** tip has quit IRC (Ping timeout: 268 seconds) [08:22] what specifically are you interested in? [08:24] it's a general question, but for example: http://courtindex.sdcourt.ca.gov/CISPublic/namesearch [08:25] this would probably be easier to enumerate : http://courtindex.sdcourt.ca.gov/CISPublic/casesearch [08:27] take for example : http://courtindex.sdcourt.ca.gov/CISPublic/casedetail?casenum=SCA153865&casesite=SD&applcode=C [08:27] searching in google for "HOWARD ADLER SCA153865" yields nothing [08:29] there are thousands of such sites [08:35] paparus: let's thake this to #archiveteam-bs [08:35] ok [08:43] Speaking of which. [08:43] I've talked about this before. [08:44] But ravearchive finally went down, as I predicted they would from their high bandwidth costs and zero funding. [08:44] The author wants to put the site back up, so I'm sure he still has all the data. [08:44] Maybe IA would be interested in having a copy? [08:44] (Even if they don't feel they can publically host it.) [08:44] http://ravearchive.com/ [09:14] *** paparus has left [09:14] *** paparus has joined #archiveteam [10:50] *** __sagitai has joined #archiveteam [11:02] *** _sagitair has quit IRC (Read error: Operation timed out) [11:05] *** i336_ has joined #archiveteam [11:40] *** tip has joined #archiveteam [11:41] There don't appear to be any items queued on the IMDB tracker? [11:42] *** bRick5772 has joined #archiveteam [11:50] Maybe we got it all? [11:50] *** odemg has quit IRC (Remote host closed the connection) [11:53] http://tracker.archiveteam.org/imdb/ [11:54] Nope [11:55] Didn't start the names yet [12:04] *** icedice has joined #archiveteam [12:06] *** odemg has joined #archiveteam [12:32] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:38] *** pizzaiolo has joined #archiveteam [13:10] *** tip is now known as dip [13:13] *** yan has quit IRC (Quit: leaving) [13:39] *** BiggieJon has quit IRC (Quit: Page closed) [14:38] *** nickware has joined #archiveteam [15:06] *** topdownji has quit IRC (Remote host closed the connection) [15:06] *** topdownji has joined #archiveteam [15:06] *** VADemon has joined #archiveteam [15:10] *** icedice has quit IRC (Quit: Leaving) [15:12] *** maelstrom has joined #archiveteam [15:17] *** SmileyG has quit IRC (Ping timeout: 250 seconds) [15:19] *** VADemon has quit IRC (Quit: left4dead) [15:34] *** i336_ has quit IRC (Ping timeout: 246 seconds) [15:36] *** nickware has quit IRC (Quit: Leaving) [15:37] *** nickware has joined #archiveteam [15:50] *** Aranje has joined #archiveteam [15:50] *** odemg has quit IRC (Remote host closed the connection) [16:01] *** Morbus has quit IRC (Ping timeout: 255 seconds) [16:05] *** Morbus has joined #archiveteam [16:09] *** VADemon has joined #archiveteam [16:09] *** odemg has joined #archiveteam [16:14] *** maelstrom has quit IRC (Ping timeout: 250 seconds) [16:15] *** odemg has quit IRC (Remote host closed the connection) [16:17] *** Morbus has quit IRC (http://www.disobey.com/) [16:22] *** odemg has joined #archiveteam [16:28] *** maelstrom has joined #archiveteam [16:35] *** odemg has quit IRC (Remote host closed the connection) [16:36] *** odemg has joined #archiveteam [16:37] *** nickware- has joined #archiveteam [16:43] *** nickware has quit IRC (Read error: Operation timed out) [16:44] *** icedice has joined #archiveteam [16:45] *** nickware- has quit IRC (Quit: Leaving) [16:47] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [16:48] *** Morbus has joined #archiveteam [16:48] *** pizzaiolo has joined #archiveteam [16:48] *** pizzaiol1 has joined #archiveteam [16:49] *** pizzaiolo has quit IRC (Remote host closed the connection) [16:49] *** pizzaiol1 has quit IRC (Remote host closed the connection) [16:49] *** pizzaiolo has joined #archiveteam [16:51] *** nox_ has quit IRC (Ping timeout: 260 seconds) [17:00] *** odemg has quit IRC (Remote host closed the connection) [17:05] *** odemg has joined #archiveteam [17:05] *** noww_ has joined #archiveteam [17:07] *** noww_ has quit IRC (Client Quit) [17:35] *** icedice2 has joined #archiveteam [17:38] *** icedice has quit IRC (Ping timeout: 260 seconds) [17:38] *** deetwelve has quit IRC (Ping timeout: 260 seconds) [17:39] *** ItsYoda has quit IRC (Ping timeout: 260 seconds) [17:44] *** ItsYoda has joined #archiveteam [17:45] *** deetwelve has joined #archiveteam [17:58] *** odemg has quit IRC (Remote host closed the connection) [18:14] *** Smiley has joined #archiveteam [18:31] *** ItsYoda has quit IRC (Ping timeout: 260 seconds) [18:32] *** deetwelve has quit IRC (Ping timeout: 260 seconds) [18:33] *** maelstrom has quit IRC (Quit: Leaving) [18:35] *** deetwelve has joined #archiveteam [18:38] *** ItsYoda has joined #archiveteam [18:43] *** odemg has joined #archiveteam [19:01] 178.62.61.231/ytglitch.mp4 [19:10] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [19:22] *** ItsYoda has quit IRC (Ping timeout: 260 seconds) [19:22] *** deetwelve has quit IRC (Ping timeout: 260 seconds) [19:25] *** deetwelve has joined #archiveteam [19:25] *** ItsYoda has joined #archiveteam [19:33] *** Muad-Dib has joined #archiveteam [19:54] *** nickware has joined #archiveteam [20:08] *** Stiletto has quit IRC (Ping timeout: 250 seconds) [20:09] *** odemg has quit IRC (Remote host closed the connection) [20:12] *** Guest7383 has joined #archiveteam [20:28] *** nickware has quit IRC (Quit: Leaving) [20:42] *** odemg has joined #archiveteam [20:49] *** bsmith093 has quit IRC (Remote host closed the connection) [20:50] *** odemg has quit IRC (Remote host closed the connection) [20:52] *** bsmith093 has joined #archiveteam [21:03] *** kristian_ has joined #archiveteam [21:04] *** ndiddy has joined #archiveteam [21:20] *** Guest7383 has quit IRC (Read error: Operation timed out) [21:29] *** maelstrom has joined #archiveteam [21:30] *** dashcloud has quit IRC (Read error: Operation timed out) [21:35] *** odemg has joined #archiveteam [21:36] *** Stil3tt0 has joined #archiveteam [21:46] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [21:48] *** pizzaiolo has joined #archiveteam [21:52] *** dashcloud has joined #archiveteam [22:01] *** icedice2 has quit IRC (Quit: Leaving) [22:26] *** dashcloud has quit IRC (Read error: Operation timed out) [22:47] *** pizzaiolo has quit IRC (Ping timeout: 506 seconds) [22:53] *** BlueMaxim has joined #archiveteam [23:21] *** BlueMaxim has quit IRC (Quit: Leaving) [23:24] *** Stil3tt0 has quit IRC (Read error: Operation timed out) [23:30] Steam Greenlight is being shut down. https://steamcommunity.com/games/593110/announcements/detail/558846854614253751 [23:34] *** kristian_ has quit IRC (Quit: Leaving) [23:42] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [23:48] good