[00:23] *** Aranje has joined #archiveteam-bs [00:30] i'm starting to upload retromash issuu docs [00:30] its a Argos Catalogs [00:31] first one uploaded: https://archive.org/details/issuu_retromash_argoschristmas-1993 [00:31] there are issues that go back to the 70s [00:34] *** pnJay has quit IRC (Read error: Connection reset by peer) [01:04] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [01:51] *** dashcloud has quit IRC (Read error: Operation timed out) [01:55] *** dashcloud has joined #archiveteam-bs [01:56] *** Stilett0 has joined #archiveteam-bs [02:26] *** Nyx has quit IRC (Ping timeout: 260 seconds) [02:27] *** Nyx has joined #archiveteam-bs [02:32] *** ndiddy has quit IRC () [03:13] *** pizzaiolo has left [03:47] *** antiufo_ has quit IRC (Quit: Leaving) [04:03] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [04:06] HCross2: how much progress has the census gotten? [04:48] *** Matt_Lock has joined #archiveteam-bs [04:50] What am I supposed to do when one of my warrior jobs stops working? It's been stuck at the same item for over 10 minutes by now. [05:14] *** Stilett0 has joined #archiveteam-bs [05:44] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [06:12] Just stops as in freezes? Which project? And btw in our terms "item" is a big work unit for Warrior. Maybe you mean a URL? [06:19] *** Matt_Lock has quit IRC (Read error: Connection reset by peer) [06:20] *** Matt_Lock has joined #archiveteam-bs [06:21] as in a single URL (888 out of 5000 or so). The whole item stopped on a single URL for I don't know how long [06:21] Note: I gave up, waited until the other item finished, then turned the warrior VM on and off again [06:21] if it's yahooanswers, then freezes has been reported already. _I_ don't know what the real issue is then [06:22] It's just that this isn't the first time it's happened, and not the last [06:22] yeah, item will be later requeued. it's fine [06:22] Yeah, it Yahooanswers [06:22] So, just turn it off and back on next time too? [06:23] I've took some time and manually updated the Warrior's distribution to latest debian... Haven't done much yet, but no hiccups so far [06:24] I'd say yes [06:25] So, should I download a new warrior VM or is my warrior on the latest version of debian? [06:27] there's no new version sadly and the warrior is outdated [06:28] Lockups happen outside warriors as well. I'm running the bare Yahoo scripts on Ubuntu 16.04. [06:28] And also on Raspbian latest. [06:29] Debian 8 on an old Android kernel - all right [06:31] I'm using this script to manually check stuck jobs. [06:31] https://bpaste.net/show/058be2870c6f [06:31] aren't there wget logs in warrior's /data/data... ? [06:32] Using "kill " makes the pipeline restart the job, rather than abandon it. [06:36] (as opposed to the shotgun-to-the-face methond of Ctrl-C) [06:40] have you thought of sending SIGINT twice and instantly remove the created STOP file? [06:49] I don't know. It's some commands JAA came up with, and I adapted. [07:57] *** icedice has joined #archiveteam-bs [08:00] *** odemg has quit IRC (Remote host closed the connection) [08:49] *** JAA has joined #archiveteam-bs [08:53] My machines also hang from time to time on Yahoo Answers, but they usually resume eventually as far as I can tell. Unlike app.net, where they'd just hang forever until you kill them. [08:53] I didn't have to kill any Yahoo Answers items yet. [08:57] By the way, I also wrote a serial killer for stuck items. It finds items whose wget.log hasn't been updated in X minutes, then looks for the corresponding process, and if it finds exactly one matching process, that process gets killed. Shared it in #crapp.net last week. [08:57] But as I said, doesn't really seem necessary for Yahoo Answers [08:59] Mininova update: nearing 40k done in 12 hours since my blunder, no ban so far [09:32] *** inittux has quit IRC (Read error: Operation timed out) [09:36] *** inittux has joined #archiveteam-bs [09:51] *** GE has joined #archiveteam-bs [10:09] Does anyone of you have experience with OVH VPS in BHS (Beauharnois, Canada)? Specifically, I'm looking at the cheapest option, VPS SSD 1. Any information regarding uptime, performance, etc. would be appreciated. [10:10] My guess is "they work in general, but if you ever run into issues, don't expect any help from support" as usual with OVH. [10:11] Use case would be grab scripts (plus minor stuff), by the way. [10:37] *** icedice has quit IRC (Ping timeout: 245 seconds) [11:08] *** fie has quit IRC (Ping timeout: 244 seconds) [11:17] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [11:19] *** fie has joined #archiveteam-bs [11:22] *** zhongfu has joined #archiveteam-bs [11:27] *** BartoCH has joined #archiveteam-bs [11:29] *** GE has quit IRC (Quit: zzz) [11:47] *** BlueMaxim has quit IRC (Quit: Leaving) [11:54] *** fie has quit IRC (Remote host closed the connection) [11:59] *** fie has joined #archiveteam-bs [12:12] *** Jonison has joined #archiveteam-bs [12:18] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [12:20] *** BartoCH has joined #archiveteam-bs [12:38] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [12:44] *** Jonison has quit IRC (Read error: Connection reset by peer) [12:59] *** GE has joined #archiveteam-bs [13:01] *** VADemon has quit IRC (Read error: Connection reset by peer) [13:28] *** pnJay has joined #archiveteam-bs [13:47] JAA: I've had Yahoo jobs stuck for many hours. [13:49] Yes, me too [13:50] I just let them run, and eventually they continued. At least I think that's what happened, couldn't check since the files get deleted at the end of a job and my tmux scrollback's limited. [14:03] *** bwn has quit IRC (Ping timeout: 960 seconds) [14:35] *** passerby has quit IRC (Read error: Operation timed out) [15:04] *** bwn has joined #archiveteam-bs [15:19] *** passerby has joined #archiveteam-bs [15:41] *** BartoCH has joined #archiveteam-bs [15:46] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [15:57] *** JAA has quit IRC (Quit: Page closed) [16:03] *** pnJay has quit IRC (Quit: Page closed) [16:06] *** pnJay has joined #archiveteam-bs [16:16] *** dashcloud has quit IRC (Read error: Operation timed out) [16:19] *** dashcloud has joined #archiveteam-bs [16:41] *** JAA has joined #archiveteam-bs [16:43] *** bwn has quit IRC (Ping timeout: 960 seconds) [16:56] *** odemg has joined #archiveteam-bs [17:01] *** Dark_Star has quit IRC (Ping timeout: 506 seconds) [17:13] *** Dark_Star has joined #archiveteam-bs [17:30] *** Matt_Lock has quit IRC (Ping timeout: 260 seconds) [17:30] *** Matt_Lock has joined #archiveteam-bs [17:32] *** inittux has quit IRC (Quit: ZNC 1.6.4 - http://znc.in) [17:58] *** pizzaiolo has joined #archiveteam-bs [18:13] so i found this: https://archive.org/details/ModernMarvels [18:13] i'm grabbing it for my archives [18:13] nifty [18:14] did they ... put the entire show in one item ... [18:14] turns out google behind the screen is a vpro backlight video [18:14] i don't know any of those words [18:14] there is a show called 'Google Behind the Screen' [18:15] and its not modern Marvels episode [18:15] ok [18:15] For the Modern Marvels item, it looks like someone uploaded a torrent and it just downloaded them all [18:15] yes this torrent: http://www.demonoid.pw/files/details/2118245/008619634785/ [18:16] from what i can tell [18:16] welllll i guess that's one way to do your torrenting [18:16] it is over 5 years old [18:16] *** _desu___ has quit IRC (Ping timeout: 260 seconds) [18:16] torrent was uploaded 2017-02-02 [18:17] this derive has been running for nearly two months :O [18:17] someone was uploading modern marvels torrents on thegeeks [18:18] cause i have low radio i decided to look for episodes online [18:18] and found that collection [18:19] this was also may not be modern marvels also: Modern Marvels S00E70 - Independence Day, The History of the 4th of July.avi [18:20] just based on intro of it [18:22] anyways i'm going on my xbox [18:23] *** bwn has joined #archiveteam-bs [18:32] xmc: I'm trying out a way of downloading IPB forums that involves going through every topic by ID (like https://foo.org/topic/XX--), and then creating a URL list from it [18:32] makes sense [18:32] the URL list consisting of https://foo.org/topic/XX-this-is-the-thread-title/?page=x [18:33] right [18:33] more efficient than throwing a recursive crawl at it I'd think, less scope for it to wander off [18:33] yep! [18:35] *** pizzaiolo has quit IRC (Read error: Operation timed out) [18:35] on another note, would anyone happen to know what the purpose of this line is? https://github.com/ArchiveTeam/ArchiveBot/blob/master/pipeline/archivebot/seesaw/wpull.py#L24 [18:35] (passing "--html-parser libxml2-lxml" to wpull) [18:35] is that more reliable than the default or something like that? [18:36] could be more robust, i'm not sure [18:51] *** bwn has quit IRC (Ping timeout: 244 seconds) [18:57] *** SmileyG has joined #archiveteam-bs [18:57] *** Smiley has quit IRC (Read error: Connection reset by peer) [18:59] *** bwn has joined #archiveteam-bs [19:01] *** BartoCH has joined #archiveteam-bs [19:06] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [19:28] *** GE has quit IRC (Quit: zzz) [19:41] *** bwn has quit IRC (Ping timeout: 244 seconds) [20:13] *** BartoCH has joined #archiveteam-bs [20:24] *** _desu___ has joined #archiveteam-bs [20:51] *** GE has joined #archiveteam-bs [21:03] *** pnJay has quit IRC (Quit: Page closed) [21:10] *** ndiddy has joined #archiveteam-bs [21:14] *** BlueMaxim has joined #archiveteam-bs [21:15] damn it google https://arstechnica.com/business/2017/03/google-reportedly-removing-sms-texting-from-hangouts-on-may-22/ [21:20] "The company is likely trying to remove any associations Hangouts has with traditional SMS messaging, so it can position Hangouts as more of a collaboration, sharing, and productivity app" [21:21] I don't see how having a feature precludes it from use in a workplace context [21:21] and if they're trying to turn it into something completely different, then maybe they should just make a new app [21:22] they made this horrible thing already https://allo.google.com/ [21:38] still no sms integration [21:38] plus everyone i talk to is on hangouts or sms [21:38] *** icedice has joined #archiveteam-bs [21:53] *** icedice2 has joined #archiveteam-bs [21:55] *** Matt_Lock has quit IRC (Ping timeout: 260 seconds) [22:00] *** icedice has quit IRC (Ping timeout: 506 seconds) [22:03] *** odemg has quit IRC (Remote host closed the connection) [22:05] *** bwn has joined #archiveteam-bs [22:22] *** odemg has joined #archiveteam-bs [22:24] *** Smiley has joined #archiveteam-bs [22:24] *** SmileyG has quit IRC (Read error: Connection reset by peer) [22:28] *** zino_ has joined #archiveteam-bs [22:38] My Mininova grab is slowing down. I'm now at 0.4 URLs per second (over the past 6 hours, compared to over 1 last night). 68k done. [22:38] *** zino_ has quit IRC (Quit: Leaving) [22:38] *** zino has quit IRC (Read error: Connection reset by peer) [22:39] *** zino has joined #archiveteam-bs [22:56] *** bwn has quit IRC (Ping timeout: 244 seconds) [22:59] *** JAA has quit IRC (Quit: Page closed) [23:17] *** bwn has joined #archiveteam-bs [23:24] *** GE_ has joined #archiveteam-bs [23:26] *** GE has quit IRC (Ping timeout: 255 seconds) [23:26] *** GE_ is now known as GE [23:31] woohoo! I annoyed some copyright holder enough to get my item on IA disabled! [23:36] *** RichardG has quit IRC (Ping timeout: 245 seconds) [23:49] *** pizzaiolo has joined #archiveteam-bs [23:51] *** GE has quit IRC (Remote host closed the connection) [23:51] *** pizzaiolo has quit IRC (Remote host closed the connection) [23:52] *** pnJay has joined #archiveteam-bs [23:52] *** pizzaiolo has joined #archiveteam-bs