[01:06] kisspunch: There are at least a couple IA staff here, I'm not sure if Somebody2 works there, but oftentimes someone will know the answer to a question even if they don't work at IA, or they'll redirect you to someone who would be more likely to know the answer. [01:27] *** Stilett0 is now known as Stiletto [01:50] *** espes__ has quit IRC (Ping timeout: 250 seconds) [02:00] *** drumstick has joined #archiveteam-bs [02:00] kisspunch: I do not, but I hang around with people who do... [02:07] *** espes__ has joined #archiveteam-bs [03:10] kisspunch: I guess to answer that question, you have to first answer another one: what do you want the person downloading the files to be able to do? just see the content? track changes between time frames? recreate the exact experience a person would've had at a point in time? something else? [03:12] kisspunch: I saw your earlier thing talking about what kind of thing you have- since it's code, this talk is probably along the lines of what you want: https://www.youtube.com/watch?v=Xx6Bb2sY4zo [03:13] it's basically an archive of everything on GitHub that has 10 stars or more, without using endless space [03:37] *** Stilett0 has joined #archiveteam-bs [03:42] *** Stiletto has quit IRC (Read error: Operation timed out) [03:47] *** marvinw is now known as ivan [03:49] dashcloud: nice [03:51] *** drumstick has quit IRC (Read error: Operation timed out) [04:01] I've moved to a new apartment. [04:01] Massive connection, and actual heat, air conditioning and working bathroom. And drinkable water! [04:01] Will be more productive [04:02] MORE productive :O [04:02] Lot to do [04:02] Lot to make up [04:04] gigabit? [04:12] Let's not go crazy. [04:12] 300mb. Quite good. [04:17] *** kyounko has joined #archiveteam-bs [04:21] cool [04:23] "How is internet in your area? I pay $27 for this crap. Supposed to be 500mbit. In Kyiv you can have 1gbit for 6 euro." Russians complaining about their 380/500 [04:23] fuckin [04:26] Right now I'd be jealous for having breathable air. [04:32] *** Stilett0 is now known as Stiletto [04:34] China? or Burbank? [04:40] Colorado Springs, actually. [04:46] * Asparagir google [04:46] * Asparagir googles [04:46] * Asparagir can't spell [04:46] oh [05:00] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:07] *** Sk1d has joined #archiveteam-bs [05:22] *** Asparagir has quit IRC (Asparagir) [05:35] *** BlueMaxim has joined #archiveteam-bs [06:27] *** drumstick has joined #archiveteam-bs [07:30] *** HCross2 has joined #archiveteam-bs [07:45] Question: If North Korea fixed this issue, then why do some domains still work? https://github.com/mandatoryprogrammer/NorthKoreaDNSLeak [08:07] *** drumstick has quit IRC (Remote host closed the connection) [08:08] *** drumstick has joined #archiveteam-bs [08:38] *** drumstick has quit IRC (Read error: Operation timed out) [08:50] *** Honno has joined #archiveteam-bs [08:51] *** drumstick has joined #archiveteam-bs [09:20] *** BlueMaxim has quit IRC (Quit: Leaving) [09:44] *** kristian_ has joined #archiveteam-bs [10:11] hook54321: They fixed the leak, i.e. you can't get a list of domains through AXFR anymore. [11:36] *** drumstick has quit IRC (Ping timeout: 600 seconds) [12:26] *** odemg has quit IRC (Read error: Operation timed out) [12:28] *** Kalroth has quit IRC (Ping timeout: 250 seconds) [12:32] *** Mateon1 has quit IRC (Ping timeout: 250 seconds) [12:33] *** Mateon1 has joined #archiveteam-bs [12:35] *** kristian_ has quit IRC (Quit: Leaving) [12:38] *** Kalroth has joined #archiveteam-bs [12:40] *** slackpi has joined #archiveteam-bs [12:41] hey everyone [12:41] i'm on my slackware rpi distro i just build this morning [12:41] turned out part of my problem was glibc-solibs script [12:42] it was not making the links [12:43] that was what was crashing berryboot kernel [12:49] *** odemg has joined #archiveteam-bs [12:50] hey odemg [12:52] hey [12:52] i got slackware arm working [13:17] odemg: my plan is to make a librarybox+kiwix hybrid on slackware arm [13:27] *** Honno has quit IRC (Read error: Operation timed out) [14:44] hey the other day I asked about scanning/submitting some old UK SF magazines (Interzone) and was advised to do 600dpi/TIFF. No problem. Any other advice or tips or URLs to read on scanning projects in general? Should I chop up the resulting TIFFs into sub-pages (each side is two separate pages from the publication) [14:44] etc [14:56] *** Honno has joined #archiveteam-bs [16:17] *** ld1 has quit IRC (Ping timeout: 260 seconds) [16:17] *** ld1 has joined #archiveteam-bs [16:25] *** schbirid has joined #archiveteam-bs [16:51] *** ld1 has quit IRC (Ping timeout: 260 seconds) [16:53] *** ld1 has joined #archiveteam-bs [16:56] slackpi, are you confusing me with someone else, this is the first I'm hearing of it? [16:59] http://radio.garden/ - a nice distraction, at least [17:03] i have talked about it before on archiveteam-bs [17:03] at least i think i talked about it here [17:06] slackpi: o wait are you godane [17:06] yes [17:06] kool [17:06] i'm on my raspberry pi 2 [17:06] same room [17:07] yeah, i seem to recall having seen you mention radio.garden a while ago [17:07] thats the one with radio stations around the world [17:08] yeup [17:22] Jon: you might be interested in writing to the internet archive directly, as they routinely do book scanning, or just look at how they handle recently scanned books that are handled internally by them. [17:25] i'm now back on my main system [17:25] for the moment :P [17:26] i'm at 3231 items for this month so far [17:27] i'm getting close to half of the items i had last month [17:30] Jon: yes each tiff should be a left or a right hand, not both [17:31] name them 0001.tif 0002.tif etc, and put them in an archive named (whatever)_images.tar [17:31] you don't have to name them anything in particular so long as they sort correctly [17:44] *** MartinThe has joined #archiveteam-bs [17:45] Hey guys, I have a question about archiving something [17:45] ask away [17:45] I want to archive a couple of Diney website games, but they seem to be some sort of horrible multi-part / multi-file SWF files [17:45] godane: http://www.oldradioworld.com/media/ (via /r/opendirectories) [17:46] So not sure how to proceed [17:46] MartinThe: ah, the type that loads new files on demand as you click through the game? [17:46] I'm trying WarcMITMProxy, but last commit is from 4 years ago and it looks like dependencies broke big time. I'm running Ubuntu 16.04 LTS [17:46] joepie91_, Correct [17:46] ah yes, those are a pain, I don't think there's a bulletproof solution for those yet [17:46] MartinThe: link to this software? [17:47] https://github.com/odie5533/WarcMITMProxy [17:47] astrid, ^^ was linked to on the a-t.org wiki [17:47] MartinThe: afaik, your options are indeed either a warc proxy of some sort, or using a decompiler/converter that can take apart the SWFs and scripting your way around it [17:47] former being theoretically easiest [17:47] joepie91_, Augh, decompiling is something I'd rather not do. WARC proxy looks like the best option [17:47] Would webrecorder work? [17:48] https://webrecorder.io/ [17:48] hook54321, Not sure, the new downloads are triggered from the running SWF [17:48] hook54321, I presume webrecorder is basically a wget-type deal? [17:48] Have you tried warcprox? https://github.com/internetarchive/warcprox [17:49] joepie91: i'm going to be lazy and give it to archivebot [17:49] MrRadar, Looks cool, will check it out in a minute. Thanks a lot! [17:49] MartinThe: You enter a starting URL and then you browse stuff manually and it puts it all into a WARC [17:51] godane: heh. just figured you might be interested in it given that you seem to do a lot of podcast/radio stuff :) [17:57] Oh heck, not just SWF. This thing's doing XML requests too. Whoa. Yup, WARC looks like the only way. MrRadar: Warcprox works fine [18:01] Glad I could help [18:02] *** cf has quit IRC (Read error: Operation timed out) [18:03] *** cf has joined #archiveteam-bs [19:01] arkiver: Did the imgh.us person reply? Also, did you contact them through the email address listed in whois or through the form on their site? [19:06] *** MartinThe has quit IRC (Remote host closed the connection) [19:07] *** Sanqui has quit IRC (Ping timeout: 260 seconds) [19:15] *** Sanqui has joined #archiveteam-bs [20:26] *** Aranje has joined #archiveteam-bs [20:27] Anyone in here comfortable parsing XML? [20:39] SketchCow: what for? [20:41] I would get my skills in perl6 honed a bit, if the task seems like I could handle it. I think the motivation is enough to make me do it. Would take like at least 12h though... [20:41] I'm going to do it a stupid way [20:41] Hold my avocado [20:41] Ha [20:41] Ok, just thought you might have some time to get it done. [20:45] Parse it with regex! :-) [20:46] *** Aranje has quit IRC (Quit: Three sheets to the wind) [20:46] I think he does, i guess that is the only stupid way. [20:58] *** sun_shine has joined #archiveteam-bs [20:58] I have a question about the wayback machine I'm not sure where else to pose [20:59] sun_shine: shoot [20:59] An historically important website I need for research purposes has been maliciously excluded [20:59] the domain is now owned by spammers who aren't interested in selling it. I'm not sure that the creators of the site can be contacted [20:59] is there anything I can do? [21:14] nope [21:14] *** schbirid has quit IRC (Quit: Leaving) [21:15] * JAA is now listening to: Metallica - Sad But True [21:38] sun_shine: Did you check if the creators of the site had an email listed in the whois for the domain? [21:39] this was back in 2009. is there anywhere i can look up historical whois stuff like that? [21:39] what's the site? [21:40] isaccorp.org [21:41] Seems to work fine for me. https://web.archive.org/web/*/http://isaccorp.com/ [21:41] the site was at isaccorp.com until 2005, when it moved to isaccorp.org [21:41] oh [21:42] The site had enemies. I can't say for certain that the original owners weren't the ones who asked for it to be excluded, but it would be out of character. [21:42] And it seems like it was manually excluded rather than by robots.txt [21:45] There's a mirror of the wayback machine, it isn't up to date though. http://web.archive.bibalex.org/web/*/http://isaccorp.org [21:45] I'm gonna try to find a way to contact the previous owners. [21:47] wait, so does this show the captures that exist but currently aren't available? [21:47] Right now, someone in Ukraine named Andrey Ahiezer owns the domain. [21:47] Have you ever heard of a site being unexcluded? I know if the issue is robots.txt, then whoever controls the domain effectively controls its past availability as well [21:48] It shows the captures that existed at the time they mirrored the wayback machine. [21:48] But since it was manually excluded I'm not sure that someone could override that even if, say, the present owner decided to [21:49] Could someone from Ukraine or someone named Andrey Ahiezer have been an enemy of the site? [21:49] Really unlikely. [21:50] If the archive cuts off at 2007, though, that seems to suggest when the request for removal was sent [21:50] Or when they last updated the mirror [21:50] When a site is excluded manually they will still crawl it [21:51] oh, nevermind they last updated the mirror in 2007 [21:51] http://web.archive.bibalex.org/web/*/http://example.org [22:02] sun_shine: domaintools has whois history, it's not free though. https://whois.domaintools.com/isaccorp.org [22:03] you know, I have very rarely encountered 'domain excluded' errors when using wayback [22:03] and I'm a really heavy user [22:04] I just checked on two other defunct advocacy websites in the same area. Both excluded - and I know that the first one was purchased by the corporation it published exposes on after the owner died. [22:04] I think they bought the domains after the expired, had them excluded, and then dumped them [22:06] What are the other two domains? [22:06] and the corporation [22:07] *** drumstick has joined #archiveteam-bs [22:07] *** dashcloud has quit IRC (Read error: Operation timed out) [22:10] *** dashcloud has joined #archiveteam-bs [22:14] intrepidnetreporter.com and caica.org . The corporation that bought intrepidnetreporter is called WWASP and has a documented history of suing online critics. All three of these websites reported critically on them. https://en.wikipedia.org/wiki/World_Wide_Association_of_Specialty_Programs_and_Schools [22:15] I think I'm just going to write info@archive.org and ask nicely. I'm not sure there's any other option. [22:15] probably yeah [22:21] *** namibj_ has quit IRC (Ping timeout: 260 seconds) [22:33] *** namibj_ has joined #archiveteam-bs [22:34] *** dashcloud has quit IRC (Read error: Operation timed out) [22:34] *** dashcloud has joined #archiveteam-bs [22:35] *** Soni has quit IRC (Read error: Operation timed out) [22:36] *** Soni has joined #archiveteam-bs [23:19] *** Asparagir has joined #archiveteam-bs [23:41] *** Honno has quit IRC (Read error: Operation timed out) [23:56] *** slackpi has quit IRC (Read error: Connection reset by peer)