[00:02] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [00:04] *** ats_ has joined #archiveteam [00:07] *** atlogbot has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** yipdw has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** WinterFox has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** ravetcofx has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** ats has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** edsu has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** robink has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** swebb has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** Laverne has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** Flierp has quit IRC (ny.us.hub irc.servercentral.net) [00:07] *** ZizzyDizz has quit IRC (ny.us.hub irc.servercentral.net) [00:14] *** Start has joined #archiveteam [00:21] *** PovAddict has joined #archiveteam [00:21] *** WinterFox has joined #archiveteam [00:21] *** ravetcofx has joined #archiveteam [00:21] *** Laverne has joined #archiveteam [00:21] *** chazchaz has joined #archiveteam [00:21] *** dserodio has joined #archiveteam [00:21] *** atlogbot has joined #archiveteam [00:21] *** Cameron_D has joined #archiveteam [00:21] *** MrRadar has joined #archiveteam [00:21] *** Flierp has joined #archiveteam [00:21] *** ZizzyDizz has joined #archiveteam [00:22] *** swebb_ is now known as swebb [00:25] *** brayden has joined #archiveteam [00:53] *** tfgbd_znc has joined #archiveteam [01:11] https://codebender.cc is shutting down [01:15] *** Somebody has joined #archiveteam [01:17] *** Start has quit IRC (Quit: Disconnected.) [01:20] *** Start has joined #archiveteam [01:23] *** Stiletto has joined #archiveteam [01:37] *** VADemon has quit IRC (Quit: left4dead) [01:59] balrog: thanks for the reminder [02:11] Can someone help me out a bit with an ignore set for w3bin.com? [02:11] *** nekomune has quit IRC (Ping timeout: 244 seconds) [02:11] I think it's best if all https://w3bin.com/domain/ links are skipped, otherwise ArchiveBot will probably archive the domain record for every domain name on the Internet [02:13] *** nekomune has joined #archiveteam [02:21] *** tfgbd_znc has quit IRC (Read error: Connection reset by peer) [02:26] *** tfgbd_znc has joined #archiveteam [02:40] never mind, figured it out [03:05] *** Somebody has quit IRC (Ping timeout: 370 seconds) [03:13] Is there any ignore set that can be applied to a running archivation job that restricts archivation to a specfic domain? [03:18] *** jrwr has quit IRC (Leaving) [03:21] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [03:43] *** Somebody has joined #archiveteam [03:50] *** Stiletto has joined #archiveteam [03:56] *** dashcloud has quit IRC (Read error: Operation timed out) [03:57] *** dashcloud has joined #archiveteam [04:26] *** vitzli has joined #archiveteam [04:29] *** BlueMaxim has joined #archiveteam [04:37] Pebble app store torrent https://www.reddit.com/r/pebble/comments/5g0gmx/in_light_of_recent_news_i_archived_the_app_store/ [04:47] can't download [04:47] it complains about some weirdly-named file, even though I'm on Linux [04:55] *** Yoshimura has quit IRC (Ping timeout: 255 seconds) [04:59] make sure you're using the second link [05:00] apparently the first one is broken [05:01] which first one? [05:02] when I clicked it already said "I've created a torrent UPDATED LINK" [05:02] oh, hm. I just used the magnet link [05:07] *** icedice has quit IRC (Quit: Leaving) [05:08] me too [05:08] and it failed [05:09] that's odd [05:24] *** no2penci1 is now known as no2pencil [05:29] *** PovAddict has quit IRC (Quit: zzz) [05:45] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:48] *** Aranje has joined #archiveteam [05:52] *** Sk1d has joined #archiveteam [05:54] *** Aranje has quit IRC (Read error: Connection timed out) [05:54] *** Aranje has joined #archiveteam [06:02] *** ravetcofx has quit IRC (Read error: Operation timed out) [06:10] *** ndiddy has quit IRC (Read error: Connection reset by peer) [06:14] *** Somebody has quit IRC (Ping timeout: 370 seconds) [06:17] *** Froggypwn has quit IRC (Ping timeout: 244 seconds) [06:18] *** ravetcofx has joined #archiveteam [06:26] *** krazedkat has quit IRC (Ping timeout: 244 seconds) [06:29] *** krazedkat has joined #archiveteam [06:33] *** jsp12345 has quit IRC (Ping timeout: 492 seconds) [06:41] *** krazedkat has quit IRC (Ping timeout: 244 seconds) [06:42] *** krazedkat has joined #archiveteam [06:55] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:04] *** alembic has joined #archiveteam [07:07] *** alembic has quit IRC (Client Quit) [07:10] *** alembic has joined #archiveteam [07:11] *** Somebody has joined #archiveteam [07:11] *** alembic has quit IRC (Client Quit) [07:11] *** alembic has joined #archiveteam [07:12] *** krazedkat has quit IRC (Quit: Leaving) [07:15] *** REiN^ has quit IRC (Max SendQ exceeded) [07:15] *** REiN^ has joined #archiveteam [07:31] *** maelstrom has quit IRC (Quit: Leaving) [07:39] *** Stiletto has quit IRC (Read error: Connection reset by peer) [07:40] *** Stiletto has joined #archiveteam [07:47] *** db48x has joined #archiveteam [08:23] *** Somebody has quit IRC (Ping timeout: 370 seconds) [09:02] *** sHATNER_ is now known as sHATNER [09:12] *** yipdw_ is now known as yipdw [09:19] *** hawc145 is now known as HCross [09:22] *** vitzli has quit IRC (Quit: Leaving) [09:28] *** vitzli has joined #archiveteam [09:34] *** HCross has quit IRC (Read error: Connection reset by peer) [09:35] *** HCross has joined #archiveteam [09:36] *** xx343 has quit IRC (Read error: Connection reset by peer) [09:37] *** xx343 has joined #archiveteam [09:43] *** Sanqui sets mode: -b *!*webchat@*.res.bhn.net [09:43] *** ArchiveAL has joined #archiveteam [09:43] Ayy [09:43] According to the account creation page - "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD" [09:43] note, #archiveteam-bs is for non priority conversation [09:44] ah, sure. 'yahoosucks' [09:44] Lol [09:45] literally just wanted to update the Desura artical to add that Desura has been purchased, although the site is still technically in danger. [09:46] no worries. cheers! [09:48] also has any part of desuras games library been saved, since i assume thats the only thing that is worth saving unless they have a forums [09:49] Desura's a storefront; there's nothing there to get that won't run afoul of morals [09:50] i mean their freeware games [09:50] that have no other host [09:50] also @Sanqui why was i banned to begin with, was it literally just a range ban or did the previous owner of my (isp owned) router do some stupid stuff [09:50] the ban was for *!*webchat@*.res.bhn.net [09:50] for good reasons [09:50] i grepped logs and haven't found it [09:51] so someone banned all brighthouse users? jeez. [09:52] tl:dr someone from your ISP was being a knob and kept rebooting their router to get around bans [09:53] . [09:53] as far as the library goes, I don't know if anyone has a copy [09:53] Hm. [09:53] oh, my grep was insufficient [09:53] Oh btw why does the wiki homepage under Proposed projects say "Google Drive web hosting to be discontinued on August 31, 2016." i just checked my google drive to be sure, its still running [09:53] honestly i'm on the verge of vouching for that person because they've helped with nifty but i'm too kind when in power [09:54] google drive *web hosting* [09:54] there was a freehost service [09:54] Oh. [09:54] hm, rip [09:55] also Radio shack is still going aswell, new CEO as of Jan 2016 [09:57] the main page says they are closing* [10:04] *** ArchiveAL has quit IRC (Quit: Page closed) [10:37] *** BlueMaxim has quit IRC (Read error: Operation timed out) [10:38] *** BlueMaxim has joined #archiveteam [10:39] *** WinterFox has quit IRC (Read error: Operation timed out) [10:39] *** ravetcofx has quit IRC (Read error: Operation timed out) [10:49] *** WinterFox has joined #archiveteam [11:40] *** BlueMaxim has quit IRC (Ping timeout: 370 seconds) [12:16] *** signius_ has joined #archiveteam [12:54] *** VADemon has joined #archiveteam [13:42] *** WinterFox has quit IRC (Read error: Operation timed out) [14:27] *** vitzli has quit IRC (Quit: Leaving) [15:28] *** tomcat has joined #archiveteam [15:30] *** RichardG_ has joined #archiveteam [15:34] *** RichardG has quit IRC (Ping timeout: 364 seconds) [15:42] *** RichardG_ is now known as RichardG [15:56] https://www.thelayoff.com/t/KBEVoB1 only a rumor so far, but Solaris may be at threat due to Oracle layoffs [15:59] *** fie has joined #archiveteam [16:05] *** tomcat has quit IRC (Ping timeout: 194 seconds) [16:07] ruh roh [16:17] HCross: it's Oracle. it's extremely likely that any horrible rumour in existence is true, and then some :) [16:19] I like that site. [16:19] Orrible [16:19] It's like a nicer version of Fuckedcompany [16:21] I flooded the deriver and OCR queue! [16:26] *** tomcat has joined #archiveteam [16:34] same here [16:34] everything is waiting to deriver [16:35] ok its not that bad now [16:35] only 14 waiting to be derive [16:35] joepie91, yea. Ill start DOWNLOADING ALL THE SOLARIS THINGS [16:40] *** tomcat has quit IRC (Remote host closed the connection) [16:42] might have been mentioned already but DoomRL has been hit with a legal letter [16:42] https://doom.chaosforge.org/ [16:43] It's been archived a few times by us now. [16:44] cool, cool. ta. I notice it's not open source (apparently it's pascal too, for whatever that's worth) [16:44] I imagine it's going to go poof shortly [16:44] *** RichardG has quit IRC (Ping timeout: 250 seconds) [16:45] ok back to iabak :) later all [16:51] *** atomotic has joined #archiveteam [16:59] *** RichardG has joined #archiveteam [16:59] Pebble is being acquired by Fitbit... https://techcrunch.com/2016/11/30/fitbit-pebble/ [17:00] *** icedice has joined #archiveteam [17:01] What would a wildcard for sub-domains look like? [17:02] like if I want to exclude es.hostadvice.com, de.hostadvice.com, fr..hostadvice.com, it.hostadvice.com, and so on? [17:02] *** nwf has joined #archiveteam [17:02] (All of the sub-domains just give 403s and are a waste of time) [17:03] *.foo.com [17:07] *** nwf__ has quit IRC (Read error: Operation timed out) [17:13] see #archivebot [17:14] *** icedice has quit IRC (Quit: Leaving) [17:36] *** Somebody has joined #archiveteam [17:41] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [18:09] *** owl has joined #archiveteam [18:41] *** owl has quit IRC (Read error: Operation timed out) [18:43] *** Somebody has quit IRC (Ping timeout: 370 seconds) [19:05] *** cadbury_ has quit IRC (Ping timeout: 250 seconds) [19:07] *** cadbury_ has joined #archiveteam [19:07] *** icedice has joined #archiveteam [19:12] *** owl has joined #archiveteam [19:12] *** owl has quit IRC (Client Quit) [19:29] *** nicolas17 has joined #archiveteam [19:31] *** ravetcofx has joined #archiveteam [19:34] *** drunksci has quit IRC (Remote host closed the connection) [19:38] *** jsp12345 has joined #archiveteam [19:42] *** VonGuard has joined #archiveteam [19:53] *** icedice has quit IRC (Read error: Operation timed out) [19:59] Grab all [20:26] *** drunksci has joined #archiveteam [20:27] *** BlueMaxim has joined #archiveteam [20:31] *** Coderjoe has joined #archiveteam [20:32] *** jrwr has joined #archiveteam [21:45] *** kcaj has quit IRC (Ping timeout: 506 seconds) [21:56] *** drunksci has quit IRC () [22:11] *** kcaj has joined #archiveteam [23:09] *** WinterFox has joined #archiveteam [23:09] *** kcaj has quit IRC (Ping timeout: 506 seconds) [23:13] *** kcaj has joined #archiveteam [23:17] *** ColdIce has joined #archiveteam [23:18] So, recieved news that a big site in my country is going to die at beginning of new year - now, how do I archive it? And how would I continue to update archive up until new year? Bandwidth not an issue, neither is storage. [23:18] what's the site url? [23:18] klara-klok.no [23:18] You wouldn't understand much, but must save! [23:19] what is it? a government health website? [23:20] Sponsoered by government, operated by people who are professionals (their line-of-work, trusted people) [23:20] * xmc nods [23:20] how long has it been around? [23:20] And it's going to die, been there for 8 years now. So time to save it, before 1st of january [23:20] oh gosh, okay [23:21] whops, 16 years **** [23:21] mostly i ask these questions to help triage and figure out what of our tools is best to use for it [23:21] not questioning whether it deserves to live [23:21] I understand, it's a site where you submit question, and recieve professional answer back that can be trusted or an opinion with links to more help within my country [23:21] sounds like an archivebot job might be the right tool, does anyone here want to monitor such a thing? [23:22] hmmmmm [23:22] that sounds useful though maybe hard to archive in a useful form [23:22] (unless the questions are sorted into categories) [23:22] So questionably, can we index the questions? It's sorted into categories :) [23:23] do you know about how many questions there are? [23:23] do the questions have urls with numbers in them, or is it more complex [23:23] Seems that there was a reset on the site in 2008 sadly, so 8 years of data is lost for us sadly :( [23:24] ass [23:24] if you can generate a list of urls of all the questions you want to snag, that'll make it a lot easier and more reliable [23:24] anyway, i have to get back to work [23:24] Yes, all the question has a number and within each question it's assigned categories that relate to the question [23:24] should probably do *something* useful today [23:24] oh that sounds delightful [23:25] Yep, should be easy to index, but I don't know the correct tool [23:26] which for I seek for help [23:26] what's the lowest and highest question number you can find? [23:26] and why don't you give some question urls here [23:27] xmc: i'm currently trying to find the easiest way to do that [23:27] ah cool [23:27] they have rss feeds, but their IDs are jumping around [23:27] ae_g_i_s: would you like to take point on this? [23:27] http://www.klara-klok.no/spoersmaal/644543 [23:27] xmc: i've never uploaded anything and don't have my boxes set up yet, so i can only help this time, sry :/ [23:27] oh, no worries [23:28] wonder if they rate limit [23:28] if you can make it suitable for archivebot then it'll be easy peasy [23:28] let's waste some IP reputation [23:28] is that a recent question? because less than a million is good [23:28] i was expecting like maybe ten million [23:28] which is an awkward number [23:29] recent question would be id 644740 [23:29] yeah, they're all of that general order [23:30] if they don't ip-ban: [23:31] i would suggest making a list of urls with all those numbers, from 1 to 645,000, and submitting it to archivebot with "!ao <" [23:31] looks like they don't [23:31] i'm currently on RSS feed ~150 (starting at 0) with no slowdown [23:31] and then add a second job to archivebot with !a http://www.klara-klok.no and add an ignore for /spoersmaal/ [23:32] and then keep an eye on those two jobs to make sure they don't go off the rails [23:32] First post ever that is available 70367 - but then again, next on is 471748 which dated the same date - wierd [23:32] hmmmmmmmmm [23:32] so maybe 470,000 to 645,000 ? [23:32] but the date and categories are available on the page that would be archive for us to date it [23:33] * xmc away [23:34] *** ndiddy has joined #archiveteam [23:34] Seem to be random, first entry ever 01.01.2008 that is available (8 years missing of data) has ID 70367, but we also have ID 60000 with post from 14.03.2013 [23:36] ColdIce: yeah, same for the RSS feeds, they wildly jump from around 250 to 4800 [23:36] or we do an archivejob with "!a <", with the list of URLs and the main site, so it will also follow links from the list of questions [23:36] currently grabbing 0-5000 just to see which ones exist [23:37] arkiver: that will likely take forever [23:37] depending on how archivebot orders found links relative to given urls [23:38] er shit i'm supposed to be away [23:38] I believe it first grabs the given URLs, then the found URLs [23:41] good news: judging from the first ~2500 RSS feeds, the visible subcategories seem to be the only accessible ones; there are 7 feeds with actual content in those 2.5k [23:43] 4 main categories on the site, 21 sub-categories. Please note, a question which has assigned sub-cateogories, *might* also have a search-url attached in where the rest of categories are, indicated by "sok" in URL [23:43] also questions do have sub-categories assigned tho [23:45] ae_g_i_s: where do you see the RSS feed? I only see RSS feed for last questions... [23:47] ColdIce: yeah, but there's one feed for each subcategory [23:47] and that feed returns the latest questions [23:48] yeah, it's not useful to grab all questions or even oldest ones [23:48] as in, the RSS is not useful for that ;) [23:51] wouldn't be easier to iterate each page within a sub-category? [23:51] Until we hit a specific HTML-text? [23:52] i was just trying to find out which categories there are and the highest ID [23:52] there seem to be 23 categories, latest 644740 (as you said), but they jump quite a bit :/ [23:53] categories: http://lpaste.net/5455619423912067072 [23:54] Nice [23:54] ColdIce: how did you grab the oldest ID? [23:58] Used last answer page without searching in specific category, oddly enough, laste page as of now is http://www.klara-klok.no/siste-svar?page=23500 [23:58] but the oldest id, is unknown due to randomness [23:58] like I said, I found id 60000 of post in 2013 and id 70367 of the oldest post ever [23:58] so ID is random and can't be used [23:59] could always iterate 1 to 23500 and fetch each html item [23:59] let's create a channel for this [23:59] and discuss in that channel