[00:26] *** Stilett0 has quit IRC (Read error: Operation timed out) [00:26] *** Stiletto has joined #archiveteam-bs [00:43] *** BlueMaxim has joined #archiveteam-bs [00:45] *** kristian_ has joined #archiveteam-bs [01:14] *** JesseW has joined #archiveteam-bs [01:14] *** kristian_ has quit IRC (Leaving) [01:23] *** xXx_ndidd has joined #archiveteam-bs [01:25] *** Mayonaise has quit IRC (Read error: Operation timed out) [01:25] *** beardicus has quit IRC (Read error: Operation timed out) [01:25] *** phuzion has quit IRC (Read error: Operation timed out) [01:25] *** ring has quit IRC (Read error: Operation timed out) [01:25] *** SmileyG has quit IRC (Read error: Operation timed out) [01:25] *** goekesmi has quit IRC (Read error: Operation timed out) [01:25] *** arkiver has quit IRC (Read error: Operation timed out) [01:26] *** botpie91 has quit IRC (Read error: Operation timed out) [01:26] i'm starting to upload 2004 episodes of MBC Newsdesk [01:27] *** GLaDOS has quit IRC (Read error: Operation timed out) [01:28] *** ndiddy has quit IRC (Read error: Operation timed out) [01:28] *** ring has joined #archiveteam-bs [01:29] *** tfgbd_znc has quit IRC (Read error: Operation timed out) [01:30] *** GLaDOS has joined #archiveteam-bs [01:30] *** arkiver has joined #archiveteam-bs [01:38] *** Smiley has joined #archiveteam-bs [01:41] *** goekesmi has joined #archiveteam-bs [01:45] *** itsjustme has joined #archiveteam-bs [02:09] *** Stiletto has quit IRC (Read error: Operation timed out) [02:09] *** Stiletto has joined #archiveteam-bs [02:17] *** botpie91 has joined #archiveteam-bs [02:18] *** beardicus has joined #archiveteam-bs [02:18] *** phuzion has joined #archiveteam-bs [02:18] *** tfgbd_znc has joined #archiveteam-bs [02:43] *** Mayonaise has joined #archiveteam-bs [03:09] *** Stiletto has quit IRC (Read error: Operation timed out) [03:09] *** Stiletto has joined #archiveteam-bs [03:51] *** Stiletto has quit IRC (Read error: Operation timed out) [03:51] *** Stiletto has joined #archiveteam-bs [04:15] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:15] *** Stilett0 has joined #archiveteam-bs [04:16] *** Stiletto has quit IRC (Read error: Operation timed out) [04:24] *** Sk1d has joined #archiveteam-bs [04:24] *** Sk1d has quit IRC (Connection closed) [04:45] *** Stilett0 has quit IRC (Read error: Operation timed out) [04:46] *** Stiletto has joined #archiveteam-bs [05:21] *** itsjustme has quit IRC (Read error: Connection reset by peer) [05:30] *** Stiletto has quit IRC (Read error: Operation timed out) [05:30] *** Stiletto has joined #archiveteam-bs [05:46] *** Stiletto has quit IRC (Read error: Operation timed out) [05:47] *** Stiletto has joined #archiveteam-bs [05:57] *** Aranje has quit IRC (Remote host closed the connection) [06:07] MrRadar: Not much in there, https://6xq.net/paste/reddit.warc.gz [06:19] *** Stiletto has quit IRC (Read error: Operation timed out) [06:19] *** Stiletto has joined #archiveteam-bs [06:41] *** Stiletto has quit IRC (Read error: Operation timed out) [06:41] *** Stiletto has joined #archiveteam-bs [06:42] *** Honno has joined #archiveteam-bs [07:05] *** Stiletto has quit IRC (Read error: Operation timed out) [07:05] *** Stiletto has joined #archiveteam-bs [07:16] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [07:26] *** Stiletto has quit IRC (Read error: Operation timed out) [07:26] *** Stiletto has joined #archiveteam-bs [07:37] *** Stiletto has quit IRC (Read error: Operation timed out) [07:37] *** Stiletto has joined #archiveteam-bs [07:45] *** JesseW has quit IRC (Ping timeout: 370 seconds) [08:11] *** Stiletto has quit IRC (Read error: Operation timed out) [08:11] *** Stiletto has joined #archiveteam-bs [08:36] *** Stiletto has quit IRC (Read error: Operation timed out) [08:36] *** Stiletto has joined #archiveteam-bs [08:43] *** Stilett0 has joined #archiveteam-bs [08:45] *** Stiletto has quit IRC (Read error: Operation timed out) [09:16] *** Stilett0 has quit IRC (Read error: Operation timed out) [09:17] *** Stiletto has joined #archiveteam-bs [09:24] *** Stilett0 has joined #archiveteam-bs [09:31] *** antomati_ has joined #archiveteam-bs [09:31] *** ItsYoda has quit IRC (Read error: Connection reset by peer) [09:31] *** swebb sets mode: +o antomati_ [09:31] *** DopefishJ has joined #archiveteam-bs [09:31] *** swebb sets mode: +o DopefishJ [09:31] *** ItsYoda has joined #archiveteam-bs [09:31] *** HCross2 has quit IRC (Ping timeout: 246 seconds) [09:33] *** DFJustin has quit IRC (Ping timeout: 263 seconds) [09:33] *** _desu___ has quit IRC (Ping timeout: 263 seconds) [09:34] *** antomatic has quit IRC (Ping timeout: 260 seconds) [09:34] *** Stiletto has quit IRC (Read error: Operation timed out) [10:13] *** Stilett0 has quit IRC (Read error: Operation timed out) [10:14] *** Stiletto has joined #archiveteam-bs [10:19] *** schbirid has joined #archiveteam-bs [10:37] *** kristian_ has joined #archiveteam-bs [10:58] *** ring has quit IRC (Remote host closed the connection) [11:10] *** kristian_ has quit IRC (Leaving) [11:14] *** VADemon has joined #archiveteam-bs [11:47] https://archive.org/details/jezebel.com-sitemap-2011-01-20160610 [12:16] *** ring has joined #archiveteam-bs [12:40] *** Stiletto has quit IRC (Read error: Operation timed out) [12:40] *** Stiletto has joined #archiveteam-bs [12:51] *** ndiddy has joined #archiveteam-bs [12:53] *** ndizzle has joined #archiveteam-bs [12:53] *** ndizzle has quit IRC (Read error: Connection reset by peer) [12:54] *** xXx_ndidd has quit IRC (Read error: Operation timed out) [12:57] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [13:02] *** kristian_ has joined #archiveteam-bs [13:19] *** Famicoman has quit IRC (Ping timeout: 260 seconds) [13:34] *** dashcloud has quit IRC (Ping timeout: 244 seconds) [13:34] *** Famicoman has joined #archiveteam-bs [13:38] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [13:40] *** r3c0d3x has joined #archiveteam-bs [13:40] *** dashcloud has joined #archiveteam-bs [13:51] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [13:52] *** r3c0d3x has joined #archiveteam-bs [14:00] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:01] *** r3c0d3x has joined #archiveteam-bs [14:08] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:10] *** r3c0d3x has joined #archiveteam-bs [14:18] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:19] *** r3c0d3x has joined #archiveteam-bs [14:24] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:25] *** r3c0d3x has joined #archiveteam-bs [14:32] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:32] *** dashcloud has quit IRC (Read error: Operation timed out) [14:33] *** r3c0d3x has joined #archiveteam-bs [14:36] *** dashcloud has joined #archiveteam-bs [14:38] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:39] *** r3c0d3x has joined #archiveteam-bs [14:46] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:47] *** r3c0d3x has joined #archiveteam-bs [14:48] *** BlueMaxim has quit IRC (Read error: Operation timed out) [14:52] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [14:53] *** r3c0d3x has joined #archiveteam-bs [14:56] *** Coderjoe has quit IRC (Ping timeout: 260 seconds) [14:57] *** Coderjoe has joined #archiveteam-bs [15:01] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:02] *** r3c0d3x has joined #archiveteam-bs [15:06] *** Coderjoe has quit IRC (Read error: Operation timed out) [15:07] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:07] *** r3c0d3x has joined #archiveteam-bs [15:11] *** Coderjoe has joined #archiveteam-bs [15:14] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:15] *** r3c0d3x has joined #archiveteam-bs [15:15] *** kristian_ has quit IRC (Leaving) [15:17] kotaku.com sitemap collection so far: https://archive.org/search.php?query=subject%3A%22kotaku.com%22&sort=-publicdate [15:20] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:22] *** r3c0d3x has joined #archiveteam-bs [15:27] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:28] *** r3c0d3x has joined #archiveteam-bs [15:33] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:34] *** r3c0d3x has joined #archiveteam-bs [15:41] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:41] *** r3c0d3x has joined #archiveteam-bs [15:45] *** r3c0d3x_ has joined #archiveteam-bs [15:46] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:50] *** r3c0d3x_ has quit IRC (Ping timeout: 260 seconds) [15:51] *** r3c0d3x has joined #archiveteam-bs [15:51] Thanks PurpleSym [15:56] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [15:57] *** r3c0d3x has joined #archiveteam-bs [16:02] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:02] *** r3c0d3x has joined #archiveteam-bs [16:09] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:10] *** r3c0d3x has joined #archiveteam-bs [16:12] *** dashcloud has quit IRC (Read error: Operation timed out) [16:15] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:15] *** r3c0d3x has joined #archiveteam-bs [16:16] *** JesseW has joined #archiveteam-bs [16:16] *** dashcloud has joined #archiveteam-bs [16:19] *** r3c0d3x_ has joined #archiveteam-bs [16:20] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:24] *** r3c0d3x_ has quit IRC (Ping timeout: 260 seconds) [16:28] *** r3c0d3x has joined #archiveteam-bs [16:33] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:38] *** r3c0d3x has joined #archiveteam-bs [16:44] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:45] *** r3c0d3x has joined #archiveteam-bs [16:45] *** Aranje has joined #archiveteam-bs [16:49] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [16:52] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:54] *** r3c0d3x has joined #archiveteam-bs [16:59] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [17:00] *** Kazzy sets mode: +b r3c0d3x!*@h255.8.117.75.dynamic.ip.windstream.net##fix_your_connection [17:04] *** r3c0d3x has joined #archiveteam-bs [17:04] well, at least we learned something. [17:04] *** Kazzy sets mode: -b r3c0d3x!*@h255.8.117.75.dynamic.ip.windstream.net##fix_your_connection [17:05] lol [17:05] *** Kazzy sets mode: +b r3c0d3x!*@* [17:05] xD [17:06] didn't realise he was changing host every time too [17:09] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [17:10] *** jut has joined #archiveteam-bs [17:17] *** ndiddy has joined #archiveteam-bs [17:58] Finally. My flickr metadata grab finished. Discovered 1958119891 unique images. [18:03] Wow, that's a lot [18:06] And it’s only a tiny fraction of the images available, MrRadar [18:06] *** Stiletto has quit IRC (Read error: Operation timed out) [18:06] Yeah [18:06] *** Stiletto has joined #archiveteam-bs [18:06] I wonder how many there are on Facebook [18:07] A lot more, I guess. [18:17] *** Stiletto has quit IRC (Read error: Operation timed out) [18:17] *** Stiletto has joined #archiveteam-bs [18:23] *** JesseW has joined #archiveteam-bs [18:26] *** schbirid has quit IRC (Quit: Leaving) [18:28] *** kristian_ has joined #archiveteam-bs [18:42] *** dashcloud has quit IRC (Read error: Operation timed out) [18:46] *** dashcloud has joined #archiveteam-bs [19:00] *** kristian_ has quit IRC (Quit: Leaving) [19:11] *** Stiletto has quit IRC (Read error: Operation timed out) [19:11] *** Stiletto has joined #archiveteam-bs [19:22] *** bzc6p has joined #archiveteam-bs [19:22] *** swebb sets mode: +o bzc6p [19:23] * bzc6p has found a website blocked by his government for the very first time [19:24] Which country and site? [19:25] In fact, it's not strongly archiving related, but my country (Hungary) seems not to use it too often. [19:26] That's why surprise. It's an international gambling site and block is due to some taxing reasons (illegal operation). [19:27] *** BlueMaxim has joined #archiveteam-bs [19:33] Interesting that I've never seen any site ISP-blocked by government, including warez sites (mass copyright infringement), bullying sites (privacy and human rights infringement) etc. [19:33] How are they blocking, out off interest [19:34] http://kepfeltoltes.hu/160611/allsports_en_www.kepfeltoltes.hu_.png [19:34] http://kepfeltoltes.hu/160611/allsports_hu_www.kepfeltoltes.hu_.png [19:35] I don't know their technicque, it returns 200 but shows that in my country. [19:36] I don't care, was just surprised, we're not accustomed to such things. Like you can download stuff quite safely, you won't get a fine like in Germany. [19:36] returning 200 is bad [19:36] but yeah [19:36] interesting, they are a british company too [19:37] also I just noticed I have op [19:37] is this supposed to be the case :p [19:37] *** bzc6p sets mode: -o Frogging [19:37] just kidding [19:37] *** bzc6p sets mode: +o Frogging [19:37] heh [19:37] *** bzc6p sets mode: -o bzc6p [19:38] So, I've heard about the regulation, by the way. It was like a few years ago. If you provide a gambling service in Hungarian, but you don't pay tax to the gov, you'll be banned. [19:38] (There is a Hungarian version under /hu/ [19:39] the only online gambling allowed in my province (Ontario) is the service provided by the Ontario government [19:39] I wonder if that's a federal rule or just provincial [19:40] That's even more strict. [19:41] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [19:41] (I looked it up, it's up to the province) [19:41] And we have cyberbullying sites for a decade and those are not blocked. Money is more important [19:42] (In fact, I don't say thzey should be blocked too, and not even that the one about should be unblocked, but still a bit strange) [19:42] *above [19:43] I guess it's easier to point at this betting site and say it breaks the law, than it is to open the censorship can of worms [19:43] And ISP blocking is quite effective. [19:43] yeah [19:43] that's a sensitive topic [19:44] it becomes political, really. one group of people finds the other group of people's opinions offensive. should either of them be censored? probably not [19:45] certainly not, in fact. it's just something that nations valuing free speech have to deal with [19:45] In fact, my opinion is that anyone should be left to say anything. The problem is not if someone says horrible things, but that there are thousands of stupid people following it. [19:45] * bzc6p idealistic [19:45] "I disapprove of what you say, but I will defend to the death your right to say it" [19:46] We need a smarter society. Otherwise we'll nuke each other anyway. [19:47] we came close 30 years ago, I think people generally wised up since then. but then I look at Trump [19:47] .. [19:47] Frogging idealistic [19:48] Not just Trump [19:48] woop woop off-topic siren... wait it's already -bs [19:48] -bs-bs [19:49] When I start speaking... So I stop it now. Rather I spreas some ops. [19:49] *** bzc6p has left [19:49] *** bzc6p has joined #archiveteam-bs [19:49] *** swebb sets mode: +o bzc6p [19:50] *** bzc6p sets mode: +o arkiver [19:51] *** whydomain has joined #archiveteam-bs [19:51] *** bzc6p sets mode: +oooo Atluxity chfoo dashcloud Fletcher [19:51] *** bzc6p sets mode: +oooo Fletcher_ GLaDOS HCross JesseW [19:52] *** bzc6p sets mode: +oooo joepie91 JW_work1 midas pluesch [19:52] *** bzc6p sets mode: +oooo Simpbrain Smiley Start VADemon [19:52] *** bzc6p sets mode: +o wp494 [19:55] If I upload a WARC to IA, will it automatically get added to the wayback machine? [19:56] No [19:56] If you've got a bunch, you need to request it from SketchCow [19:57] Generally WARCs with unknown provanenace don't get added to the Wayback Machine [19:57] If the pages are still available, it'd be better to add them via web.archive.org/save or #archivebot [19:57] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [19:58] If not, just upload the WARC yourself, and maybe also extract any of the particularly interesting content (images, text, etc) and upload those to separate items so they can be found more easily (make sure to put plenty of metadata). [19:59] Thanks, is there etiquette for WARCs? (i.e: one WARC per section of a site) or do people not mind what web pages are contained in a WARC? [20:02] generally it doesn't matter much -- it's best not to make them *too* big (i.e. someone in the range of a few GB, IIRC) but if people want particular pieces from them, there are tools to extract that. I wouldn't worry overmuch about adjusting them before uploading. [20:04] *** Honno has quit IRC (Read error: Operation timed out) [20:04] The main point is to have as clear a chain of provenance as you can get -- i.e. don't repackage them if you can avoid it, include whatever information you are willing/able to share about the network/ISP connection they were grabbed from, etc. [20:04] can / should multiple WARCs be combined together by simple joining the text files? (I've got a few thousand WARCs, one for each page) [20:04] I *think* so, but I'm not sure. [20:05] No [20:06] Long answer: [20:06] One important bit is that if you compress them, make sure you compress each WARC record separately, and concatenate the results [20:06] Try the megawarc tool, that is designed for this – works well and is simple [20:06] * JesseW agrees with bzc6p [20:07] bzc6p agrees with JesseW [20:07] I wanted to say that, yes, warc.gz-s are comppressed per record. Not worth struggling with that, and bad repackaging makes the warc unusable. megawarc is your tool. [20:08] Thanks! megawarc is just what I needed [20:08] :) [20:09] bad repackaging requires it to be undone and done correctly -- it (generally) doesn't permanently *corrupt* the data, though (unlike, say, lossy encoding) [20:09] (again, as-I-understand-it) [20:09] straightforward [20:16] Well, just ungzipping a warc.gz gives you a single big textfile. By-record recompressign would need intelligent tools. However, I've come across a tool that unpacks it by-record, and later gzipping it with the proper option (there is a per-file compressing) it MIGHT work. (Unnecessary prattle, use megawarc, that's it.) [20:17] *** Coderjoe has joined #archiveteam-bs [20:18] *** RichardG has quit IRC (Ping timeout: 258 seconds) [20:48] *** whydomain has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [21:08] *** bzc6p has left [21:18] https://pbs.twimg.com/media/CkNzM8tVEAE08XM.jpg [21:28] *** jut has quit IRC (Leaving) [21:41] *** Stiletto has quit IRC (Read error: Operation timed out) [21:41] *** Stiletto has joined #archiveteam-bs [21:44] *** closure has joined #archiveteam-bs [21:44] *** midas sets mode: +o closure [21:47] *** dashcloud has quit IRC (Read error: Operation timed out) [21:49] *** yipdw has quit IRC (Quit: No Ping reply in 180 seconds.) [21:50] *** yipdw has joined #archiveteam-bs [21:50] *** dashcloud has joined #archiveteam-bs [21:58] *** tomwsmf-a has joined #archiveteam-bs [22:15] *** yipdw has quit IRC (Read error: Operation timed out) [22:22] *** yipdw has joined #archiveteam-bs [22:32] *** ralphdnak has joined #archiveteam-bs [22:34] *** godane has quit IRC (Read error: Operation timed out) [22:44] *** godane has joined #archiveteam-bs [22:45] so my old hard drive may have fry a usb port [22:45] i'm not 100% but nothing pops up in thunar vs the other one now [22:58] *** dashcloud has quit IRC (Read error: Operation timed out) [23:02] *** dashcloud has joined #archiveteam-bs [23:26] *** Stiletto has quit IRC (Read error: Operation timed out) [23:26] *** Stiletto has joined #archiveteam-bs [23:40] *** godane has quit IRC (Leaving.) [23:47] *** godane has joined #archiveteam-bs [23:47] godane: Some USB controllers use polyfuses on the power lines which self-reset after a while; give it a day or so then try it again [23:48] i restarted and its working again [23:49] i put my usb boot stick into it see if it even detected the drive [23:55] the port in question was ss with lightening bolt symbol