#archiveteam 2016-12-02,Fri

↑back Search

Time Nickname Message
00:02 🔗 Stilett0 has quit IRC (Ping timeout: 246 seconds)
00:04 🔗 ats_ has joined #archiveteam
00:07 🔗 atlogbot has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 yipdw has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 WinterFox has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 ravetcofx has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 ats has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 edsu has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 robink has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 swebb has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 Laverne has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 Flierp has quit IRC (ny.us.hub irc.servercentral.net)
00:07 🔗 ZizzyDizz has quit IRC (ny.us.hub irc.servercentral.net)
00:14 🔗 Start has joined #archiveteam
00:21 🔗 PovAddict has joined #archiveteam
00:21 🔗 WinterFox has joined #archiveteam
00:21 🔗 ravetcofx has joined #archiveteam
00:21 🔗 Laverne has joined #archiveteam
00:21 🔗 chazchaz has joined #archiveteam
00:21 🔗 dserodio has joined #archiveteam
00:21 🔗 atlogbot has joined #archiveteam
00:21 🔗 Cameron_D has joined #archiveteam
00:21 🔗 MrRadar has joined #archiveteam
00:21 🔗 Flierp has joined #archiveteam
00:21 🔗 ZizzyDizz has joined #archiveteam
00:22 🔗 swebb_ is now known as swebb
00:25 🔗 brayden has joined #archiveteam
00:53 🔗 tfgbd_znc has joined #archiveteam
01:11 🔗 balrog https://codebender.cc is shutting down
01:15 🔗 Somebody has joined #archiveteam
01:17 🔗 Start has quit IRC (Quit: Disconnected.)
01:20 🔗 Start has joined #archiveteam
01:23 🔗 Stiletto has joined #archiveteam
01:37 🔗 VADemon has quit IRC (Quit: left4dead)
01:59 🔗 arkiver balrog: thanks for the reminder
02:11 🔗 icedice Can someone help me out a bit with an ignore set for w3bin.com?
02:11 🔗 nekomune has quit IRC (Ping timeout: 244 seconds)
02:11 🔗 icedice I think it's best if all https://w3bin.com/domain/ links are skipped, otherwise ArchiveBot will probably archive the domain record for every domain name on the Internet
02:13 🔗 nekomune has joined #archiveteam
02:21 🔗 tfgbd_znc has quit IRC (Read error: Connection reset by peer)
02:26 🔗 tfgbd_znc has joined #archiveteam
02:40 🔗 icedice never mind, figured it out
03:05 🔗 Somebody has quit IRC (Ping timeout: 370 seconds)
03:13 🔗 icedice Is there any ignore set that can be applied to a running archivation job that restricts archivation to a specfic domain?
03:18 🔗 jrwr has quit IRC (Leaving)
03:21 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
03:43 🔗 Somebody has joined #archiveteam
03:50 🔗 Stiletto has joined #archiveteam
03:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:57 🔗 dashcloud has joined #archiveteam
04:26 🔗 vitzli has joined #archiveteam
04:29 🔗 BlueMaxim has joined #archiveteam
04:37 🔗 Frogging Pebble app store torrent https://www.reddit.com/r/pebble/comments/5g0gmx/in_light_of_recent_news_i_archived_the_app_store/
04:47 🔗 PovAddict can't download
04:47 🔗 PovAddict it complains about some weirdly-named file, even though I'm on Linux
04:55 🔗 Yoshimura has quit IRC (Ping timeout: 255 seconds)
04:59 🔗 Frogging make sure you're using the second link
05:00 🔗 Frogging apparently the first one is broken
05:01 🔗 PovAddict which first one?
05:02 🔗 PovAddict when I clicked it already said "I've created a torrent UPDATED LINK"
05:02 🔗 Frogging oh, hm. I just used the magnet link
05:07 🔗 icedice has quit IRC (Quit: Leaving)
05:08 🔗 PovAddict me too
05:08 🔗 PovAddict and it failed
05:09 🔗 Frogging that's odd
05:24 🔗 no2penci1 is now known as no2pencil
05:29 🔗 PovAddict has quit IRC (Quit: zzz)
05:45 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:48 🔗 Aranje has joined #archiveteam
05:52 🔗 Sk1d has joined #archiveteam
05:54 🔗 Aranje has quit IRC (Read error: Connection timed out)
05:54 🔗 Aranje has joined #archiveteam
06:02 🔗 ravetcofx has quit IRC (Read error: Operation timed out)
06:10 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
06:14 🔗 Somebody has quit IRC (Ping timeout: 370 seconds)
06:17 🔗 Froggypwn has quit IRC (Ping timeout: 244 seconds)
06:18 🔗 ravetcofx has joined #archiveteam
06:26 🔗 krazedkat has quit IRC (Ping timeout: 244 seconds)
06:29 🔗 krazedkat has joined #archiveteam
06:33 🔗 jsp12345 has quit IRC (Ping timeout: 492 seconds)
06:41 🔗 krazedkat has quit IRC (Ping timeout: 244 seconds)
06:42 🔗 krazedkat has joined #archiveteam
06:55 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
07:04 🔗 alembic has joined #archiveteam
07:07 🔗 alembic has quit IRC (Client Quit)
07:10 🔗 alembic has joined #archiveteam
07:11 🔗 Somebody has joined #archiveteam
07:11 🔗 alembic has quit IRC (Client Quit)
07:11 🔗 alembic has joined #archiveteam
07:12 🔗 krazedkat has quit IRC (Quit: Leaving)
07:15 🔗 REiN^ has quit IRC (Max SendQ exceeded)
07:15 🔗 REiN^ has joined #archiveteam
07:31 🔗 maelstrom has quit IRC (Quit: Leaving)
07:39 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
07:40 🔗 Stiletto has joined #archiveteam
07:47 🔗 db48x has joined #archiveteam
08:23 🔗 Somebody has quit IRC (Ping timeout: 370 seconds)
09:02 🔗 sHATNER_ is now known as sHATNER
09:12 🔗 yipdw_ is now known as yipdw
09:19 🔗 hawc145 is now known as HCross
09:22 🔗 vitzli has quit IRC (Quit: Leaving)
09:28 🔗 vitzli has joined #archiveteam
09:34 🔗 HCross has quit IRC (Read error: Connection reset by peer)
09:35 🔗 HCross has joined #archiveteam
09:36 🔗 xx343 has quit IRC (Read error: Connection reset by peer)
09:37 🔗 xx343 has joined #archiveteam
09:43 🔗 Sanqui sets mode: -b *!*webchat@*.res.bhn.net
09:43 🔗 ArchiveAL has joined #archiveteam
09:43 🔗 ArchiveAL Ayy
09:43 🔗 ArchiveAL According to the account creation page - "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD"
09:43 🔗 Sanqui note, #archiveteam-bs is for non priority conversation
09:44 🔗 Sanqui ah, sure. 'yahoosucks'
09:44 🔗 ArchiveAL Lol
09:45 🔗 ArchiveAL literally just wanted to update the Desura artical to add that Desura has been purchased, although the site is still technically in danger.
09:46 🔗 Sanqui no worries. cheers!
09:48 🔗 ArchiveAL also has any part of desuras games library been saved, since i assume thats the only thing that is worth saving unless they have a forums
09:49 🔗 yipdw Desura's a storefront; there's nothing there to get that won't run afoul of morals
09:50 🔗 ArchiveAL i mean their freeware games
09:50 🔗 ArchiveAL that have no other host
09:50 🔗 ArchiveAL also @Sanqui why was i banned to begin with, was it literally just a range ban or did the previous owner of my (isp owned) router do some stupid stuff
09:50 🔗 Sanqui the ban was for *!*webchat@*.res.bhn.net
09:50 🔗 yipdw for good reasons
09:50 🔗 Sanqui i grepped logs and haven't found it
09:51 🔗 ArchiveAL so someone banned all brighthouse users? jeez.
09:52 🔗 HCross tl:dr someone from your ISP was being a knob and kept rebooting their router to get around bans
09:53 🔗 ArchiveAL .
09:53 🔗 yipdw as far as the library goes, I don't know if anyone has a copy
09:53 🔗 ArchiveAL Hm.
09:53 🔗 Sanqui oh, my grep was insufficient
09:53 🔗 ArchiveAL Oh btw why does the wiki homepage under Proposed projects say "Google Drive web hosting to be discontinued on August 31, 2016." i just checked my google drive to be sure, its still running
09:53 🔗 Sanqui honestly i'm on the verge of vouching for that person because they've helped with nifty but i'm too kind when in power
09:54 🔗 Sanqui google drive *web hosting*
09:54 🔗 Sanqui there was a freehost service
09:54 🔗 ArchiveAL Oh.
09:54 🔗 ArchiveAL hm, rip
09:55 🔗 ArchiveAL also Radio shack is still going aswell, new CEO as of Jan 2016
09:57 🔗 ArchiveAL the main page says they are closing*
10:04 🔗 ArchiveAL has quit IRC (Quit: Page closed)
10:37 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
10:38 🔗 BlueMaxim has joined #archiveteam
10:39 🔗 WinterFox has quit IRC (Read error: Operation timed out)
10:39 🔗 ravetcofx has quit IRC (Read error: Operation timed out)
10:49 🔗 WinterFox has joined #archiveteam
11:40 🔗 BlueMaxim has quit IRC (Ping timeout: 370 seconds)
12:16 🔗 signius_ has joined #archiveteam
12:54 🔗 VADemon has joined #archiveteam
13:42 🔗 WinterFox has quit IRC (Read error: Operation timed out)
14:27 🔗 vitzli has quit IRC (Quit: Leaving)
15:28 🔗 tomcat has joined #archiveteam
15:30 🔗 RichardG_ has joined #archiveteam
15:34 🔗 RichardG has quit IRC (Ping timeout: 364 seconds)
15:42 🔗 RichardG_ is now known as RichardG
15:56 🔗 HCross https://www.thelayoff.com/t/KBEVoB1 only a rumor so far, but Solaris may be at threat due to Oracle layoffs
15:59 🔗 fie has joined #archiveteam
16:05 🔗 tomcat has quit IRC (Ping timeout: 194 seconds)
16:07 🔗 Frogging ruh roh
16:17 🔗 joepie91 HCross: it's Oracle. it's extremely likely that any horrible rumour in existence is true, and then some :)
16:19 🔗 SketchCow I like that site.
16:19 🔗 Frogging Orrible
16:19 🔗 SketchCow It's like a nicer version of Fuckedcompany
16:21 🔗 SketchCow I flooded the deriver and OCR queue!
16:26 🔗 tomcat has joined #archiveteam
16:34 🔗 godane same here
16:34 🔗 godane everything is waiting to deriver
16:35 🔗 godane ok its not that bad now
16:35 🔗 godane only 14 waiting to be derive
16:35 🔗 HCross joepie91, yea. Ill start DOWNLOADING ALL THE SOLARIS THINGS
16:40 🔗 tomcat has quit IRC (Remote host closed the connection)
16:42 🔗 Jon might have been mentioned already but DoomRL has been hit with a legal letter
16:42 🔗 Jon https://doom.chaosforge.org/
16:43 🔗 SketchCow It's been archived a few times by us now.
16:44 🔗 Jon cool, cool. ta. I notice it's not open source (apparently it's pascal too, for whatever that's worth)
16:44 🔗 Jon I imagine it's going to go poof shortly
16:44 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
16:45 🔗 Jon ok back to iabak :) later all
16:51 🔗 atomotic has joined #archiveteam
16:59 🔗 RichardG has joined #archiveteam
16:59 🔗 alembic Pebble is being acquired by Fitbit... https://techcrunch.com/2016/11/30/fitbit-pebble/
17:00 🔗 icedice has joined #archiveteam
17:01 🔗 icedice What would a wildcard for sub-domains look like?
17:02 🔗 icedice like if I want to exclude es.hostadvice.com, de.hostadvice.com, fr..hostadvice.com, it.hostadvice.com, and so on?
17:02 🔗 nwf has joined #archiveteam
17:02 🔗 icedice (All of the sub-domains just give 403s and are a waste of time)
17:03 🔗 HCross *.foo.com
17:07 🔗 nwf__ has quit IRC (Read error: Operation timed out)
17:13 🔗 DFJustin see #archivebot
17:14 🔗 icedice has quit IRC (Quit: Leaving)
17:36 🔗 Somebody has joined #archiveteam
17:41 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
18:09 🔗 owl has joined #archiveteam
18:41 🔗 owl has quit IRC (Read error: Operation timed out)
18:43 🔗 Somebody has quit IRC (Ping timeout: 370 seconds)
19:05 🔗 cadbury_ has quit IRC (Ping timeout: 250 seconds)
19:07 🔗 cadbury_ has joined #archiveteam
19:07 🔗 icedice has joined #archiveteam
19:12 🔗 owl has joined #archiveteam
19:12 🔗 owl has quit IRC (Client Quit)
19:29 🔗 nicolas17 has joined #archiveteam
19:31 🔗 ravetcofx has joined #archiveteam
19:34 🔗 drunksci has quit IRC (Remote host closed the connection)
19:38 🔗 jsp12345 has joined #archiveteam
19:42 🔗 VonGuard has joined #archiveteam
19:53 🔗 icedice has quit IRC (Read error: Operation timed out)
19:59 🔗 SketchCow Grab all
20:26 🔗 drunksci has joined #archiveteam
20:27 🔗 BlueMaxim has joined #archiveteam
20:31 🔗 Coderjoe has joined #archiveteam
20:32 🔗 jrwr has joined #archiveteam
21:45 🔗 kcaj has quit IRC (Ping timeout: 506 seconds)
21:56 🔗 drunksci has quit IRC ()
22:11 🔗 kcaj has joined #archiveteam
23:09 🔗 WinterFox has joined #archiveteam
23:09 🔗 kcaj has quit IRC (Ping timeout: 506 seconds)
23:13 🔗 kcaj has joined #archiveteam
23:17 🔗 ColdIce has joined #archiveteam
23:18 🔗 ColdIce So, recieved news that a big site in my country is going to die at beginning of new year - now, how do I archive it? And how would I continue to update archive up until new year? Bandwidth not an issue, neither is storage.
23:18 🔗 xmc what's the site url?
23:18 🔗 ColdIce klara-klok.no
23:18 🔗 ColdIce You wouldn't understand much, but must save!
23:19 🔗 xmc what is it? a government health website?
23:20 🔗 ColdIce Sponsoered by government, operated by people who are professionals (their line-of-work, trusted people)
23:20 🔗 * xmc nods
23:20 🔗 xmc how long has it been around?
23:20 🔗 ColdIce And it's going to die, been there for 8 years now. So time to save it, before 1st of january
23:20 🔗 xmc oh gosh, okay
23:21 🔗 ColdIce whops, 16 years ****
23:21 🔗 xmc mostly i ask these questions to help triage and figure out what of our tools is best to use for it
23:21 🔗 xmc not questioning whether it deserves to live
23:21 🔗 ColdIce I understand, it's a site where you submit question, and recieve professional answer back that can be trusted or an opinion with links to more help within my country
23:21 🔗 xmc sounds like an archivebot job might be the right tool, does anyone here want to monitor such a thing?
23:22 🔗 xmc hmmmmm
23:22 🔗 xmc that sounds useful though maybe hard to archive in a useful form
23:22 🔗 xmc (unless the questions are sorted into categories)
23:22 🔗 ColdIce So questionably, can we index the questions? It's sorted into categories :)
23:23 🔗 xmc do you know about how many questions there are?
23:23 🔗 xmc do the questions have urls with numbers in them, or is it more complex
23:23 🔗 ColdIce Seems that there was a reset on the site in 2008 sadly, so 8 years of data is lost for us sadly :(
23:24 🔗 xmc ass
23:24 🔗 xmc if you can generate a list of urls of all the questions you want to snag, that'll make it a lot easier and more reliable
23:24 🔗 xmc anyway, i have to get back to work
23:24 🔗 ColdIce Yes, all the question has a number and within each question it's assigned categories that relate to the question
23:24 🔗 xmc should probably do *something* useful today
23:24 🔗 xmc oh that sounds delightful
23:25 🔗 ColdIce Yep, should be easy to index, but I don't know the correct tool
23:26 🔗 ColdIce which for I seek for help
23:26 🔗 xmc what's the lowest and highest question number you can find?
23:26 🔗 xmc and why don't you give some question urls here
23:27 🔗 ae_g_i_s xmc: i'm currently trying to find the easiest way to do that
23:27 🔗 xmc ah cool
23:27 🔗 ae_g_i_s they have rss feeds, but their IDs are jumping around
23:27 🔗 xmc ae_g_i_s: would you like to take point on this?
23:27 🔗 ae_g_i_s http://www.klara-klok.no/spoersmaal/644543
23:27 🔗 ae_g_i_s xmc: i've never uploaded anything and don't have my boxes set up yet, so i can only help this time, sry :/
23:27 🔗 xmc oh, no worries
23:28 🔗 ae_g_i_s wonder if they rate limit
23:28 🔗 xmc if you can make it suitable for archivebot then it'll be easy peasy
23:28 🔗 ae_g_i_s let's waste some IP reputation
23:28 🔗 xmc is that a recent question? because less than a million is good
23:28 🔗 xmc i was expecting like maybe ten million
23:28 🔗 xmc which is an awkward number
23:29 🔗 ColdIce recent question would be id 644740
23:29 🔗 ae_g_i_s yeah, they're all of that general order
23:30 🔗 xmc if they don't ip-ban:
23:31 🔗 xmc i would suggest making a list of urls with all those numbers, from 1 to 645,000, and submitting it to archivebot with "!ao <"
23:31 🔗 ae_g_i_s looks like they don't
23:31 🔗 ae_g_i_s i'm currently on RSS feed ~150 (starting at 0) with no slowdown
23:31 🔗 xmc and then add a second job to archivebot with !a http://www.klara-klok.no and add an ignore for /spoersmaal/
23:32 🔗 xmc and then keep an eye on those two jobs to make sure they don't go off the rails
23:32 🔗 ColdIce First post ever that is available 70367 - but then again, next on is 471748 which dated the same date - wierd
23:32 🔗 xmc hmmmmmmmmm
23:32 🔗 xmc so maybe 470,000 to 645,000 ?
23:32 🔗 ColdIce but the date and categories are available on the page that would be archive for us to date it
23:33 🔗 * xmc away
23:34 🔗 ndiddy has joined #archiveteam
23:34 🔗 ColdIce Seem to be random, first entry ever 01.01.2008 that is available (8 years missing of data) has ID 70367, but we also have ID 60000 with post from 14.03.2013
23:36 🔗 ae_g_i_s ColdIce: yeah, same for the RSS feeds, they wildly jump from around 250 to 4800
23:36 🔗 arkiver or we do an archivejob with "!a <", with the list of URLs and the main site, so it will also follow links from the list of questions
23:36 🔗 ae_g_i_s currently grabbing 0-5000 just to see which ones exist
23:37 🔗 xmc arkiver: that will likely take forever
23:37 🔗 xmc depending on how archivebot orders found links relative to given urls
23:38 🔗 xmc er shit i'm supposed to be away
23:38 🔗 arkiver I believe it first grabs the given URLs, then the found URLs
23:41 🔗 ae_g_i_s good news: judging from the first ~2500 RSS feeds, the visible subcategories seem to be the only accessible ones; there are 7 feeds with actual content in those 2.5k
23:43 🔗 ColdIce 4 main categories on the site, 21 sub-categories. Please note, a question which has assigned sub-cateogories, *might* also have a search-url attached in where the rest of categories are, indicated by "sok" in URL
23:43 🔗 ColdIce also questions do have sub-categories assigned tho
23:45 🔗 ColdIce ae_g_i_s: where do you see the RSS feed? I only see RSS feed for last questions...
23:47 🔗 ae_g_i_s ColdIce: yeah, but there's one feed for each subcategory
23:47 🔗 ColdIce and that feed returns the latest questions
23:48 🔗 ae_g_i_s yeah, it's not useful to grab all questions or even oldest ones
23:48 🔗 ae_g_i_s as in, the RSS is not useful for that ;)
23:51 🔗 ColdIce wouldn't be easier to iterate each page within a sub-category?
23:51 🔗 ColdIce Until we hit a specific HTML-text?
23:52 🔗 ae_g_i_s i was just trying to find out which categories there are and the highest ID
23:52 🔗 ae_g_i_s there seem to be 23 categories, latest 644740 (as you said), but they jump quite a bit :/
23:53 🔗 ae_g_i_s categories: http://lpaste.net/5455619423912067072
23:54 🔗 ColdIce Nice
23:54 🔗 ae_g_i_s ColdIce: how did you grab the oldest ID?
23:58 🔗 ColdIce Used last answer page without searching in specific category, oddly enough, laste page as of now is http://www.klara-klok.no/siste-svar?page=23500
23:58 🔗 ColdIce but the oldest id, is unknown due to randomness
23:58 🔗 ColdIce like I said, I found id 60000 of post in 2013 and id 70367 of the oldest post ever
23:58 🔗 ColdIce so ID is random and can't be used
23:59 🔗 ColdIce could always iterate 1 to 23500 and fetch each html item
23:59 🔗 arkiver let's create a channel for this
23:59 🔗 arkiver and discuss in that channel

irclogger-viewer