[00:07] Yes, please check the filename!!!! [00:07] Very very important [00:07] Scripts are automatically synced, so a wrong filename will cuase the whole grab to crash [00:07] If you are not sure, make a pull request [00:08] Thats what just happened [00:10] Start: I see you added some new websites, thanks! [00:10] Little thing, if the rss contains all the news you don't have to add the frontpage of a website too [00:11] ok [00:14] *** asdf has joined #archiveteam [00:15] *** asdf has quit IRC (Remote host closed the connection) [00:16] *** REiN^ has quit IRC (Read error: Operation timed out) [00:18] Everything grabbed by the NewsGrabber will go into newssites packs [00:18] https://archive.org/search.php?query=newssites%20AND%20collection%3Aopensource&sort=-publicdate [00:21] *** REiN^ has joined #archiveteam [00:27] Atluxity: merged! [00:27] Are you also going to create file for the feeds you removed with a longer refreshtime? [00:39] joepie91: theintercept is added, thanks! [00:40] *** JesseW has joined #archiveteam [00:50] *** RichardG_ is now known as RichardG [01:13] *** systemn2 has joined #archiveteam [01:15] *** systemn2 has quit IRC (Client Quit) [01:15] *** rctbeast has joined #archiveteam [01:22] *** err3 has quit IRC (Remote host closed the connection) [01:44] *** asdf has joined #archiveteam [01:59] *** redlob_ has joined #archiveteam [02:01] *** redlob has quit IRC (Read error: Operation timed out) [02:10] *** philpem has quit IRC (Remote host closed the connection) [02:23] more sites have been added to newsgrabber! [02:27] should we make a new channel for newsgrabber? [02:34] Start: would it be safe to run newsgrabber in it's current state? [02:39] newsgrabber is already running at http://newsgrabber.harrycross.me:29000 [02:40] and we should probably move newsgrabber discussion to #newsgrabber [02:44] alright, sweet [02:50] *** VADemon has quit IRC (left4dead) [02:54] *** maseck has quit IRC (Remote host closed the connection) [03:51] chfoo-: could you get logchfoo to start logging #newsgrabber? [03:53] *** RichardG has quit IRC (Read error: Connection reset by peer) [03:57] *** RichardG has joined #archiveteam [04:12] *** maseck has joined #archiveteam [04:13] *** logchfoo starts logging #archiveteam at Mon Dec 21 04:13:30 2015 [04:13] *** logchfoo has joined #archiveteam [04:34] *** BlueMaxim has joined #archiveteam [04:54] *** redlob has joined #archiveteam [04:54] *** dashcloud has quit IRC (Read error: Connection reset by peer) [04:55] *** dashcloud has joined #archiveteam [04:55] *** redlob_ has quit IRC (Read error: Operation timed out) [05:01] *** rctbeast has quit IRC (Ping timeout: 240 seconds) [05:12] *** dashcloud has quit IRC (Read error: Operation timed out) [05:19] *** dashcloud has joined #archiveteam [05:43] *** redlob has quit IRC (Read error: Operation timed out) [05:46] *** redlob has joined #archiveteam [06:32] *** dashcloud has quit IRC (Read error: Operation timed out) [06:35] *** dashcloud has joined #archiveteam [06:51] *** Coderjoe has quit IRC (Read error: Operation timed out) [06:58] *** Coderjoe has joined #archiveteam [07:34] *** scyther has joined #archiveteam [08:26] *** DFJustin has quit IRC (Read error: Operation timed out) [08:36] *** DFJustin has joined #archiveteam [08:36] *** swebb sets mode: +o DFJustin [09:04] *** atomotic has joined #archiveteam [09:04] *** atomotic has quit IRC (Connection closed) [09:04] *** Elegance has quit IRC (Ping timeout: 369 seconds) [09:05] *** Elegance has joined #archiveteam [09:07] *** JesseW has quit IRC (Leaving.) [09:17] *** dashcloud has quit IRC (Read error: Operation timed out) [09:31] *** dashcloud has joined #archiveteam [09:33] *** alberto has joined #archiveteam [09:34] *** schbirid has joined #archiveteam [09:36] *** dashcloud has quit IRC (Read error: Operation timed out) [09:42] *** dashcloud has joined #archiveteam [09:44] *** vitzli has joined #archiveteam [09:47] *** mistym has quit IRC (Remote host closed the connection) [09:57] *** primus104 has joined #archiveteam [09:58] *** primus104 has left [10:00] *** bauruine has quit IRC (Ping timeout: 260 seconds) [10:02] *** alberto has quit IRC (Read error: Operation timed out) [10:25] I would love to know if anyone grabbed the telethon stream. [10:25] Youtube only keeps the last 12 hours. [10:25] hit me up [10:29] *** atomotic has joined #archiveteam [10:39] Wait, you did not record yourself, SketchCo1 ? [10:39] arkiver: nah, those are such specialists feeds... [10:43] that sucks, they really only save the last 12 hours? [10:43] how silly [10:46] it is not aboit complaining. it's about seeing what peiple might hace. [10:46] godane might have it [10:46] SketchCo1: we started a NewsGrabber [10:46] #newsgrabber [10:47] *** SketchCo1 is now known as Sketchcow [10:47] We're going to get every new article from every news website saved! [10:47] ^ We just need a list of all the newssites in the world! [10:47] SketchCow: i didn't capture it [10:48] i only grab the first 3 hours+ [10:49] btw 12 hours is alot better then i fear it would be [10:49] yeah. we had some special stuff in the missing foitage. [10:49] but lits of stuff in the stuff we have too [10:50] We are going to reach ibside google [10:50] see if we can pull a favor [10:50] grab the last 12 hours first [10:51] will [10:51] has to finish pricessing [10:51] where the link to the last 12 hours anyways [10:53] SketchCow: anyways i found medium.com sitemaps again [10:54] so i'm grabbing the 3 months back log [11:00] *** vitzli has quit IRC (Quit: Leaving) [11:37] *** afics has quit IRC (Quit: Quit.) [11:38] *** afics has joined #archiveteam [11:48] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [12:19] *** alberto has joined #archiveteam [12:34] *** vtyl has quit IRC (Read error: Operation timed out) [12:43] Sketchcow: Just saw your tweet. I grabbed the stream and I'm actually in the middle of uploading it to the Archive. [12:43] *** lytv has joined #archiveteam [12:44] and I do have Saturday 4pm-12am [12:45] you have the whole stream then? [12:45] 100%? [12:51] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:14] I believe so [13:14] https://archive.org/details/iatelethon [13:14] found it [13:15] Yep, that's it. I'm trying to add videos one at a time, but the page has been on this for the past half hour or so. https://i.imgur.com/kGQIBLp.png [13:32] *** WinterFox has quit IRC (Remote host closed the connection) [13:43] *** slyphic|a is now known as slyphic [14:05] asdf: Just ran a script over my recordings to see what times they cover [14:05] 2015-12-19T11:56:01-0800 to 2015-12-19T11:56:51-0800 [14:05] 2015-12-19T11:56:55-0800 to 2015-12-19T11:57:02-0800 [14:05] 2015-12-19T11:57:00-0800 to 2015-12-19T12:45:52-0800 [14:05] 2015-12-19T12:49:04-0800 to 2015-12-19T14:15:27-0800 [14:05] 2015-12-19T14:17:16-0800 to 2015-12-19T14:17:23-0800 [14:05] 2015-12-19T14:18:19-0800 to 2015-12-19T14:18:26-0800 [14:05] 2015-12-19T14:19:22-0800 to 2015-12-19T14:19:29-0800 [14:05] 2015-12-19T14:20:25-0800 to 2015-12-19T14:20:32-0800 [14:05] 2015-12-19T14:21:27-0800 to 2015-12-19T14:21:34-0800 [14:05] 2015-12-19T14:22:30-0800 to 2015-12-19T14:22:37-0800 [14:05] 2015-12-19T15:38:46-0800 to 2015-12-19T21:38:47-0800 [14:05] 2015-12-19T21:40:02-0800 to 2015-12-20T03:39:25-0800 [14:05] 2015-12-20T03:41:21-0800 to 2015-12-20T09:41:22-0800 [14:05] 2015-12-20T09:42:34-0800 to 2015-12-20T12:13:23-0800 [14:05] Looks like there's only a couple minute long gaps when the connection dropped for a bit [14:07] Never mind. There's an hour gap around 2PM on the 19th. Is that when the stream went down on your end, Sketchcow? [14:08] i think it was around that time [14:09] the first 3 and half hours are saved [14:10] so if it dropped then we still have it [14:21] *** Froggypwn has quit IRC (Ping timeout: 483 seconds) [14:28] *** Froggypwn has joined #archiveteam [14:38] *** Ghost_of_ has joined #archiveteam [14:53] a book in NL has been banned [14:53] http://nos.nl/artikel/2076565-srebrenica-boek-de-doofpotgeneraal-alsnog-verboden.html [14:53] cc arkiver [14:53] it may be worth trying to obtain a copy of this somewhere, before it slips into unavailability too much [14:54] first two chapters are at http://speakeasy.nl/wp-content/uploads/2015/05/DDG-hoofdstuk-1-en-2.pdf anyway [14:56] Everyone say hello to NewsBuddy, our new news archive bot in #newsgrabberbot (well all it does is say what the bot is saving) [15:03] *** FAMAS has joined #archiveteam [15:03] greetings all [15:16] *** scyther has quit IRC (Quit: Leaving) [15:16] yeah it was down for around an hour so that sounds complete [15:17] *** nertzy has joined #archiveteam [15:19] looks like the item is just taking a while to encode the video (derive.php) https://archive.org/history/iatelethon - should finish at some point [15:26] *** godane has quit IRC (Ping timeout: 250 seconds) [15:30] HCross: Is it cool to just start throwing PRs at NewsBuddy? I've got a few sites I'd like to see archived. [15:31] its not meant to do one site once, its designed to go back over the sites [15:31] Right, these are news sources that put out multiple news articles per day. [15:31] Throw it in! [15:32] Cool, I'll get started. [15:39] *** VADemon has joined #archiveteam [15:39] *** godane has joined #archiveteam [15:40] looks like this guy recorded the telethon as well https://archive.org/details/@samizdat [15:42] *** asdf has quit IRC (Ping timeout: 252 seconds) [15:43] HCross: PR created. Let me know if I did things right. [15:43] *** atomotic has joined #archiveteam [15:43] this guy is uploading tons of cd images: https://archive.org/details/@denzquix [15:44] gotta grab this one for sure https://archive.org/details/mastering-internet-development-with-activex [15:50] ActiveX - please end me [15:51] SketchCow: these should be move to the computer shopping collection: https://archive.org/search.php?query=computer%20shopper%20collection%3Amagazine_rack [15:51] what is required for this user to gain the ability to issue commands to archivebot? [15:57] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [16:00] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [16:05] godane: SimTunes! [16:06] some good stuff there [16:07] i know [16:07] i found a random computer shopper uk cd from july 1998 [16:07] *** atomotic has joined #archiveteam [16:07] thats how i found this guy [16:09] looks stuff uploading a month ago is in cdrom collection [16:09] so it will get there at some point [16:10] *** atomotic has quit IRC (Client Quit) [16:11] request to channel operators to gain voice [16:14] *** atomotic has joined #archiveteam [16:15] *** atomotic has quit IRC (Client Quit) [16:21] *** FAMAS has quit IRC (Read error: Connection reset by peer) [16:21] *** FAMAS has joined #archiveteam [16:21] samizdat got 8pm onward [16:25] looking fir 4pm to 8pm [16:27] ah looks like mutoso's upload will have that once the item processes some more [16:40] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [16:49] HCross: Any chance you can take a look at that PR I sent? I've got a few others I'd like to write, and want to make sure I'm doing things at least somewhat right [16:49] *** dashcloud has joined #archiveteam [16:52] *** scyther has joined #archiveteam [16:53] phuzion: commented [16:54] arkiver: would adding just that rss URL work, or should I add each individual RSS link on that page? [16:55] how to obtain a copy of a completed site archive for personal use in archive formats? [16:55] phuzion: it looks like http://blog.cleveland.com/realtimenews/atom.xml would cover all articles [16:55] so that only should be enough [16:57] *** sivoais has joined #archiveteam [16:57] Though maybe note local news, you might need to add more rss feeds for that too [16:57] Is the bot smart enough to not grab the same article twice if it shows up in the same RSS feed twice? [16:57] Yes [16:57] I mean, if it shows up in two different RSS feeds [16:58] So, it might just be smartest to throw all the RSS feeds in there then? [16:58] No, because that would mean all URLs of those RSS feeds will be grabbed over and over again [16:59] Oh, ok. I'll start with the realtimenews url for now [16:59] ok, let's do that [17:00] *** nertzy has joined #archiveteam [17:00] phuzion: but if you need to add all RSS feeds to grab all articles, then please do that [17:01] If one or two of the RSS feeds update frequently and the others don't you can add two python files [17:01] with two different refresh times, one file for the short refresh time and one for the longer refresh time [17:15] *** JesseW has joined #archiveteam [17:28] *** FAMAS has quit IRC (Ping timeout: 369 seconds) [17:31] *** Jogie has quit IRC (Ping timeout: 506 seconds) [17:33] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [17:38] *** JesseW has quit IRC (Leaving.) [17:54] *** VADemon has quit IRC (Read error: Operation timed out) [17:57] *** atomotic has joined #archiveteam [18:04] *** human39 has joined #archiveteam [18:17] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [18:17] *** Lord_Nigh has quit IRC (Read error: Connection reset by peer) [18:22] *** Lord_Nigh has joined #archiveteam [18:25] *** alberto has quit IRC (Read error: Operation timed out) [19:05] *** superkuh has quit IRC (Read error: Operation timed out) [19:11] *** rctbeast has joined #archiveteam [19:20] *** superkuh has joined #archiveteam [19:34] *** bauruine has joined #archiveteam [19:36] *** mutoso has quit IRC (Read error: Operation timed out) [19:37] *** mutoso has joined #archiveteam [19:40] *** nertzy has joined #archiveteam [19:45] Sketchcow: first 12 hours are up now at https://archive.org/details/iatelethon [19:48] *** human39 has quit IRC (Leaving) [19:49] *** brayden has quit IRC (Read error: Connection reset by peer) [19:49] *** brayden has joined #archiveteam [19:49] *** swebb sets mode: +o brayden [20:24] *** JW_work1 has joined #archiveteam [20:26] *** JW_work has quit IRC (Ping timeout: 255 seconds) [20:28] *** bauruine has quit IRC (Ping timeout: 250 seconds) [20:29] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [20:33] *** JW_work1 has quit IRC (Ping timeout: 260 seconds) [20:33] *** bauruine has joined #archiveteam [20:42] *** JW_work has joined #archiveteam [21:01] *** no2penci1 is now known as no2pencil [21:02] *** rctbeast has quit IRC (Ping timeout: 240 seconds) [21:11] *** Ghost_of_ has quit IRC (Quit: Leaving) [21:15] *** bobpoeker has joined #archiveteam [21:16] *** bobpoeker has quit IRC (Client Quit) [21:16] *** bobpoeker has joined #archiveteam [21:17] *** bobpoeker is now known as beeper [21:17] *** www2 has joined #archiveteam [21:19] arkiver: Sketchcow: somebody I know has a copy of that banned book, but no reasonable means to scan it (they have a camera, but no budget for a glass plate, etc.) -- any suggestions on how to resolve this? ebook versions are apparently poof [21:20] (this is in NL, btw) [21:20] *** bauruine has quit IRC (Ping timeout: 260 seconds) [21:20] *** nertzy has joined #archiveteam [21:25] *** bauruine has joined #archiveteam [21:37] *** sep332_ has joined #archiveteam [21:37] *** sep332 has quit IRC (Read error: Connection reset by peer) [21:43] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [21:44] mutoso: Thank you very much. [21:44] Why is the book banned. [21:45] Let's.... idly ask that. [21:51] Sketchcow: ex-employee of MIVD (called MID back then, basically the military arm of the Dutch GCHQ) sued over "defamation" [21:52] it's all a very sketchy story [21:54] *** Ghost_of_ has joined #archiveteam [21:56] it's also been going on since August (although apparently a new ruling came out recently): https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=thrillerlezers.blogspot.com%2F2015%2F08%2Fde-doofpotgeneraal-uit-de-handel.html&edit-text=&act=url [22:02] *** nickname has joined #archiveteam [22:02] what is the progress on notepad.cc [22:02] ? [22:03] *** nertzy has joined #archiveteam [22:07] ul [22:07] *** nickname has quit IRC (Quit: Page closed) [22:08] *** slyphic is now known as slyphic|a [22:12] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [22:17] *** dashcloud has quit IRC (Read error: Connection reset by peer) [22:17] *** dashcloud has joined #archiveteam [22:24] Sketchcow: No problem. Glad to help. [22:26] *** dashcloud has quit IRC (Read error: Operation timed out) [22:30] *** scyther has quit IRC (Read error: Connection reset by peer) [22:30] *** dashcloud has joined #archiveteam [22:44] http://ascii.textfiles.com/archives/3086 . Sorry. [22:49] Somebody was talking about archiving muds — relevant link: http://ascii.textfiles.com/archives/3039 [22:50] (fine, fine, I'll stop trawling through Jason's blog now.) [22:53] joepie91: you could check the diybookscanner.org forums to see if there's someone in the Netherlands with one they could send it to [22:53] Vito`: I suspect they'd be wary of sending it to random people [22:54] (the person who reported about it and has a copy, happens to be somebody who somewhat knows me) [22:57] if they're looking to just preserve the raw content, handheld photos of the pages can be enough to run through something like Scan Tailor or Book Scan Wizard to clean up and process with OCR [22:58] http://www.worldcat.org/title/doofpotgeneraal-overheidsspionage-op-de-bedrijfsvloer/oclc/898360755 says it is ~200 pages [22:59] Vito`: right. I'd prefer a proper scan (whether using a camera or otherwise), hence the question :P presently trying to negotiate a glass sheet [22:59] I have a camera that would be suitable for DIY book scanning [22:59] just no cover plate [22:59] the easiest cheapest DIY book scanner is just a cardboard box and an acrylic picture frame you can get at any shop [23:00] the acrylic will slowly get scratched but it'll be fine for a few hundred pages [23:00] let me find an example [23:00] Vito`: "any shop" is not a very wide collection of shops here, certainly not for a reasonable price, and my budget is currently 0 [23:00] :P [23:01] if you have framed photos you already have one [23:03] http://www.diybookscanner.org/forum/viewtopic.php?f=14&t=2436&p=12787&hilit=cardboard#p12781 [23:03] I don't [23:03] they have a wood frame but you can use more cardboard [23:04] moment... [23:04] Vito`: "21cm" refers to what? [23:04] on worldcat [23:05] probably the diagonal [23:07] Vito`: I could maybe obtain something like this http://www.action.nl/decoratie/fotolijsten-muurdecoratie/fotocollagelijst-hout-42x42x3cm-wit-natural-zwart [23:08] this is cheaper [23:08] http://www.action.nl/decoratie/fotolijsten-muurdecoratie/fotolijst-wit-zilver-kunststof-21x29-7cm [23:08] idk whether it'll be enough [23:08] or how easy it'll be to remove the non-glass parts [23:10] there's a chance the latter one won't be big enough, since you don't know the dimensions, but either one will probably suffice. they're probably acrylic, not glass, but they'll get you through a whole book as long as you're careful about scratches [23:11] but yes, that, and a couple cardboard boxes, are generally all you need [23:11] and a black magic marker I guess, and overhead light [23:12] (photo frames are usually glass here) [23:12] will have a go at a setup tomorrow [23:12] thanks [23:13] good luck [23:15] *** gibigian1 has quit IRC (Ping timeout: 252 seconds) [23:21] OK, my mind is starting to clear up [23:23] Recovered from the Telethon? [23:23] *** gibigiana has joined #archiveteam [23:36] *** WinterFox has joined #archiveteam [23:37] *** Stilett0 has quit IRC ()