[00:05] *** mistym has quit IRC (Quit: Leaving) [00:08] joepie91_: my best guess is it's an exact copy of the one for sale on the ISO site, and so that's why the claim exists- not for the content, but because duplicates of final ISO standards are likely copies of the paid one [00:09] as far as I know, drafts are fine to circulate, but final versions tend be treated as more off-limits [00:20] dashcloud: it's a "final draft" [00:20] so.. still a draft [00:22] no idea then, unless standards people do something silly like consider that the finished version [00:34] *** primus105 has joined #archiveteam-bs [00:37] *** primus104 has quit IRC (Read error: Operation timed out) [00:54] *** primus has joined #archiveteam-bs [00:55] What the... https://web.archive.org/web/20150402005504/http://textfiles.com/jason/ "Page cannot be crawled or displayed due to robots.txt." [00:58] works here [01:00] *** SN4T14_ is now known as SN4T14 [01:03] DFJustin: You had an interesting effect [01:03] Turns out that the archive.org crawlers, once given a robots.txt block, never checked that block again. [01:03] That is about to change [01:04] Ah, it's working now. Weirdness :P [01:09] *** mistym has joined #archiveteam-bs [01:40] *** dashcloud has quit IRC (Read error: Connection reset by peer) [01:46] *** dashcloud has joined #archiveteam-bs [02:32] *** primus105 has quit IRC (Leaving.) [03:07] *** mistym has quit IRC (Remote host closed the connection) [03:11] *** useretail has quit IRC (Dreaming in digital. Living in real-time. Thinking in binary. Talking in IP.) [03:22] *** Ara_ has quit IRC (Read error: Operation timed out) [03:25] *** Ara_ has joined #archiveteam-bs [03:32] *** Ara_ has quit IRC (Read error: Operation timed out) [03:34] *** Ara_ has joined #archiveteam-bs [03:35] *** mistym has joined #archiveteam-bs [03:36] *** dashcloud has quit IRC (Read error: Operation timed out) [03:41] *** Ara_ has quit IRC (Read error: Operation timed out) [03:43] *** Ara_ has joined #archiveteam-bs [03:43] *** dashcloud has joined #archiveteam-bs [03:59] *** Ara_ has quit IRC (Read error: Operation timed out) [04:04] *** Ara_ has joined #archiveteam-bs [04:09] *** Ara_ has quit IRC (Read error: Operation timed out) [04:15] *** Ara_ has joined #archiveteam-bs [04:15] *** aaaaaaaaa has quit IRC (Leaving) [04:43] hmm that didn't used to be the case [04:53] *** Ara_ has quit IRC (Read error: Operation timed out) [04:55] *** Ara_ has joined #archiveteam-bs [05:08] *** Ara_ has quit IRC (Read error: Operation timed out) [05:14] *** Ara_ has joined #archiveteam-bs [05:23] *** Ara_ has quit IRC (Read error: Operation timed out) [05:27] *** Ara_ has joined #archiveteam-bs [05:36] *** mistym has quit IRC (Remote host closed the connection) [05:38] *** mistym has joined #archiveteam-bs [05:45] *** Ara_ has quit IRC (Read error: Operation timed out) [05:47] *** Ara_ has joined #archiveteam-bs [06:06] *** Ara_ has quit IRC (Read error: Connection reset by peer) [06:08] *** Ara_ has joined #archiveteam-bs [06:14] *** arkhive has joined #archiveteam-bs [06:16] SketchCow: Does IA do free prepaid postage to their hq? I have those floppy disks. like two thousand and it will cost a lot to ship. Do they do that? Apologies if idiotic lol [06:18] *** Ara_ has quit IRC (Read error: Operation timed out) [06:18] Or is there someone heading to San Fransisco that is passing through Colorado that can take them there for me? Let me know please. Thanks. :) and i gotta go to bed. right now it is 12:18am. goodnight. but i'll check mIRC when i wake up [06:19] *** Ara_ has joined #archiveteam-bs [06:21] I don't know if this is helpful to anyone, but this is the solution I came up with to the problem of keeping an frequently updated directory tree in IA. :) Although, I haven't really tested it much, so it remains to be seen how it holds up under use. Anyway, hopefully this answers my question for anyone else who has it :) https://fracture-active.googlecode.com/svn/trunk/More/Patche/patche [06:28] *** mistym has quit IRC (Remote host closed the connection) [06:29] *** mistym has joined #archiveteam-bs [06:31] kyan: goddamn [06:31] that could use some more newlines :) [06:36] *** Ara_ has quit IRC (Read error: Operation timed out) [06:38] *** Ara_ has joined #archiveteam-bs [06:44] arkhive: You should be sending them to me. [06:44] In NY [06:48] *** mistym has quit IRC (Remote host closed the connection) [07:13] *** schbirid has joined #archiveteam-bs [08:01] *** primus104 has joined #archiveteam-bs [08:08] *** Ara_ has quit IRC (Read error: Operation timed out) [08:09] *** Ara_ has joined #archiveteam-bs [08:20] *** Ara_ has quit IRC (Read error: Operation timed out) [08:21] *** Ara_ has joined #archiveteam-bs [08:32] *** Ara_ has quit IRC (Read error: Operation timed out) [08:33] *** Ara_ has joined #archiveteam-bs [08:45] *** primus104 has quit IRC (Leaving.) [08:53] *** rejon has quit IRC (Remote host closed the connection) [08:53] *** rejon has joined #archiveteam-bs [09:09] *** Ara_ has quit IRC (Read error: Connection reset by peer) [09:09] *** Ara_ has joined #archiveteam-bs [09:27] *** Ara_ has quit IRC (Read error: Operation timed out) [09:28] *** Ara_ has joined #archiveteam-bs [09:30] *** brayden has quit IRC (Quit: Leaving) [09:37] *** primus104 has joined #archiveteam-bs [09:57] *** Ara_ has quit IRC (Read error: Operation timed out) [09:59] *** Ara_ has joined #archiveteam-bs [10:07] *** Ara_ has quit IRC (Read error: Operation timed out) [10:08] *** Ara_ has joined #archiveteam-bs [10:32] *** Ara_ has quit IRC (Read error: Operation timed out) [10:36] *** Ara_ has joined #archiveteam-bs [10:49] *** Ara_ has quit IRC (Read error: Connection reset by peer) [10:52] *** Ara_ has joined #archiveteam-bs [11:31] *** Ara__ has joined #archiveteam-bs [11:38] *** Ara_ has quit IRC (Ping timeout: 492 seconds) [11:44] *** Ara__ has quit IRC (Read error: Operation timed out) [11:46] *** Ara__ has joined #archiveteam-bs [11:48] *** brayden has joined #archiveteam-bs [12:01] *** brayden has quit IRC (Read error: Connection reset by peer) [12:06] *** brayden has joined #archiveteam-bs [12:09] *** Ara__ has quit IRC (Read error: Connection reset by peer) [12:10] *** Ara__ has joined #archiveteam-bs [12:19] *** Ara__ has quit IRC (Read error: Operation timed out) [12:20] *** Ara__ has joined #archiveteam-bs [12:22] SketchCow: your morning commute must be horrible if you drive from to SF [12:23] from NY* [12:29] *** Ara__ has quit IRC (Read error: Operation timed out) [12:30] *** Ara__ has joined #archiveteam-bs [12:41] *** primus104 has quit IRC (Leaving.) [13:07] *** Ara__ has quit IRC (Read error: Operation timed out) [13:09] *** Ara__ has joined #archiveteam-bs [13:11] *** Ara__ has quit IRC (Read error: Connection reset by peer) [13:12] *** Ara__ has joined #archiveteam-bs [13:23] *** BlueMaxim has quit IRC (Quit: Leaving) [13:26] *** Ara__ has quit IRC (Read error: Operation timed out) [13:27] *** Ara__ has joined #archiveteam-bs [13:42] *** Ara__ has quit IRC (Read error: Operation timed out) [13:46] https://twitter.com/digitalocean/status/583625072319004673 [13:47] *** Ara__ has joined #archiveteam-bs [13:47] aaand the account is protected.. [13:57] *** Ara__ has quit IRC (Read error: Operation timed out) [13:58] *** Ara__ has joined #archiveteam-bs [14:10] *** Ara__ has quit IRC (Read error: Operation timed out) [14:12] *** Ara__ has joined #archiveteam-bs [14:21] *** Ara__ has quit IRC (Read error: Operation timed out) [14:24] *** Ara_ has joined #archiveteam-bs [14:32] *** mistym has joined #archiveteam-bs [14:34] *** mistym has quit IRC (Remote host closed the connection) [14:45] Kazzy: huh? [14:45] *** Ara_ has quit IRC (Read error: Operation timed out) [14:46] *** Ara_ has joined #archiveteam-bs [14:47] *** primus104 has joined #archiveteam-bs [14:48] My commute is nightmarish but there's this awesome diner in Illinois I stop at on the way [14:51] *** mistym has joined #archiveteam-bs [14:53] lol, just a 40 hour drive ;) [14:57] *** Ara_ has quit IRC (Read error: Operation timed out) [14:59] *** primus104 has quit IRC (Leaving.) [14:59] *** Ara_ has joined #archiveteam-bs [15:10] Rotab: looked like someone had got into their acc.. turns out it was an employee having some fun [15:18] *** Ara_ has quit IRC (Read error: Operation timed out) [15:19] *** Ara_ has joined #archiveteam-bs [15:30] *** Ara_ has quit IRC (Read error: Operation timed out) [15:31] *** Ara_ has joined #archiveteam-bs [15:39] *** Ara_ has quit IRC (Read error: Operation timed out) [15:40] *** Ara_ has joined #archiveteam-bs [15:44] *** mistym has quit IRC (Remote host closed the connection) [15:52] *** Ara_ has quit IRC (Read error: Operation timed out) [15:53] *** Ara_ has joined #archiveteam-bs [15:58] *** mistym has joined #archiveteam-bs [16:28] *** aaaaaaaaa has joined #archiveteam-bs [16:53] *** bzc6p has joined #archiveteam-bs [16:53] Hello [16:53] I have a dilemma. [16:53] I'm scraping an image sharing service. There are so called private pictures, that means, they don't appear in the image browser, but knowing the id, typing that in the url anyone can access it. [16:53] I've seen a couple of image sharing services, and normally this id is difficult enough that it's impossible to find out with brute force. [16:53] But in this particular case the id is an incremential number. So if I go from 1 to infinity, I get everything, including "private" pictures. [16:53] I'd like to know what you'd do if you were me. Shall I bother doing a discovery of the browser pages (not that difficult, just time and some work), or shall I "preserve" everything? [16:54] Thanks for your input. [16:54] *** primus104 has joined #archiveteam-bs [16:54] 1. I grab them all. [16:54] 2. If you feel bad, you should inform the vendor as it's a security bug. [16:54] 3. Don't feel bad [16:55] 4. Damn, picture ID 304A3B - that's now how you use a toothbrush [16:56] bzc6p: maybe there is also metadata available that says if a pic is "meant to be private" or not [16:56] schbirid: I know what are private and what not. If I do a discovery, what are listed on the browser pages, are public, and only those are saved if I care. [16:57] Question is, should I care? [16:57] dunno [16:57] your own decision :) [16:57] i'd only grab the public stuff myself or keep the private stuff for myself, advertising that i can provide them with reasonable proof (how i try to with Fileplanet) [16:58] http://archive.org/details/www.yelp.com-biz-memories-pizza-walkerton-20150402 [16:58] SketchCow: the system is built on a bad conception. [16:58] i grabbed all the comments pages and images from there [16:59] I didn't say the problem could be easily fixed. [16:59] SketchCow: okay. I'll have to get some money together. that is a lot of money for me lol. :) [16:59] There was another service, wchich listed every pictures, including private ones, on a page left rw-r--r--, and Google listed it. In that case I informed the admin and he hid the page. Well, this couldn't be done here. [16:59] You're basically asking the standard "disclosure" question. [17:00] bzc6p: is it google+ photos? [17:00] If you can figure out the sequence, you can get all of those too. [17:01] Well, the links must be somewhere in deep chatlogs. And one dayin 2025, there will be some which the poster would like to see, others not. But it's the same with public pictures. [17:02] lol [17:02] chatlogs are hardly private [17:02] posters are stupid [17:02] grab everything \o/ [17:02] On the other hand, they still remain private, except for those people who search for and find the warc containing them [17:02] which I suppose is almost nobody. And, not knowing the person, almost no pictures can be embarassing. [17:03] Smiley: no, its not G+ [17:03] shame [17:03] have you found any interesting pictures? [17:04] /interesting/ [17:04] I don't spend much time doing that, but a few I saw were not embarassing. [17:04] sorry if i asked before, but what can i use to unpack a warc.gz into the individual files again? [17:04] male and female naked pictures, there are a lot public. and not only porn frames, but homemade pictures too. [17:06] So 3-1 in favor of grab everything, including me preferring that [17:06] so far [17:08] * bzc6p comes back soon to read more opinions if any [17:09] schbirid: warcat extract, https://pypi.python.org/pypi/Warcat/ [17:11] bzc6p: in this particular case I'd be in favor of not grabbing the private ones [17:11] thx [17:11] the "lol stupid users don't know an auto-incrementing ID is guessable" excuse is not something I enjoy [17:11] that said another possibility is to keep a record of which are private and dark them in IA [17:36] Well, there is another point that supports yipdw. [17:36] Do we save content for the uploader, or for internet audience? I think we work for the latter. [17:37] And for them, only public things matter. For those revirewing their own chatlogs and not finding their stuff int 2025 - well, why did you rely on the clown? [17:38] Putting things on free internet services and backing up things is two entirely different concepts. [17:39] (Using correct grammar is a third one.) [18:05] Well, considering that I didn't save private pictures in other cases, and also considering the things above, and in lack of consensus, I think I won't save private ones. [18:05] I do rather spend my resources on other sources' public things. [18:06] i'd suggest iterating over all the urls and saving them all, but if you can easily determine which is which, sort the private ones out into a different item [18:06] (although lot of "public" pictures are meant to be private, but one can't differentiate them, everything must be saved) [18:09] xmc: I see your concept. But I can hardly imagine a situation when that's necessary. We talk about random people's random pictures, possibly no important ones. [18:09] I mean, photo of plane landing on Hudson river wouldn't be stored ONLY at these places. [18:10] And I can't imagine Samantha Doe writing to info@archive.org that "Dear Sir or Madam, I see some pictures of example.com are stored, but don't you eventualy have picture ID 234567 of my selfie with duckface from 2010? [18:11] (Sorry, selfies with duckfaces were probably not common in 2010, but you'll get it.) [18:13] we've had people come into the channel in a blind panic asking for their otherwise-deleted pictures of their grandson [18:13] it's not inconceivable [18:13] and it has the possibility to make that person very happy [18:14] but if you want to not save it, i guess that's your decision? [18:14] what's the numbers on these anyway [18:14] hm. [18:16] Well, once I make a discovery, it's me who needs to make the least effort to sort the pictures as public and as not. [18:16] So it makes sense. [18:17] i don't understand what you just said, could you use more words please [18:18] Once I have the list of public picture IDs, it's easy to have the private ones. So I *should*, at least, keep a record of them. [18:18] That's what I meant, in support of your last argument. [18:18] oh [18:18] yeah [18:18] i'm not arguing. i just told you what i would do. [18:19] I think I used the wrong word "argument", I meant... [18:19] if you do a thing that i think is wrong, i will have a sad and then we will talk [18:19] * bzc6p takes the dictionary [18:19] otherwise what's good is good [18:20] what site are you working on anyway [18:22] partition isPrivate would have been easy to do by now :P [18:22] grab it all, sort into separate private and public megawarcs, dark as appropriate [18:22] done [18:22] argument (noun) [...] 4. a statement, reason, or fact for or against a point [18:22] yours was a fact and a reason at least. [18:22] also if someone can tell me why the hell krunner in KDE 4 keeps locking up that'd be awesome [18:22] ok [18:23] xmc: I just thought I'll do so. [18:23] anyway bzc6p i've told you what i prefer and why, now it's your turn to make a decision [18:23] I appreciate your input. [18:23] if you want more dictionary fun, look up "analysis paralysis", and then press pageup a few times in irc :) [18:23] cool [18:24] I didnt know anyone ever came into here in blind panic for a photo [18:25] not here specifically, but Tabblo etc. have shown that people do look for their things [18:31] Well, then, I'll make a discovery, sort items to private and public, and upload them to items accordingly, and after that I may tell SketchCo w to darken them. [18:33] And now I'll abort the just started "universal" process. No problem, as I found a more important project to do, now I can deal with that. [18:33] Thank you everyone for the debate. [18:34] *** primus104 has quit IRC (Leaving.) [18:41] yipdw: does warcat extract to the dir my shell is currently in or relative to the warc? [18:43] nvm, i decided to live dangerous [18:43] its the current dir :) [18:59] *** mistym has quit IRC (Quit: Leaving) [19:00] *** rolf has joined #archiveteam-bs [19:12] i'm grabbing pypi.python.org sources [19:12] *** mistym has joined #archiveteam-bs [19:14] hngg, the files wget saved vs what comes out of the warc seems quite different [19:14] just looking at the list of files atm and trying to remember what i did [19:14] * ersi pours schbirid some liqour [19:15] nono, i am currently hacking into my new kobo and must nto screw up [19:15] wtf http://www.mobileread.com/forums/showthread.php?t=162713 ! [19:16] * joepie91_ adds Kobo to no-buy list [19:18] nah its kinda fine, it runs linux [19:21] 'it runs linux' seems to be the safeword for tracking. it shouldnt be there from the start, running linux or not [19:23] well, i have had my share of openpandora so i am not that keen about open hardware anymore [19:24] although that wasnt even open [19:24] ok this is funky [19:24] i just rebooted my domotica system. [19:24] and my fileserver [19:24] did not realize it was running on the same box... [19:24] with root credentials [19:27] so looks like i can grab UN videos with youtube-dl [19:27] *** primus104 has joined #archiveteam-bs [19:30] domotica sounds kinky [19:30] *** bzc6p has quit IRC (Read error: Operation timed out) [19:32] *** bzc6p has joined #archiveteam-bs [19:37] *** Smiley has quit IRC (http://www.milkme.co.uk - You'll never understand.) [19:38] schbirid: it kinda is [19:38] all my lights/heating/power is controlled from that box [19:39] *** bzc6p has left [19:47] *** mistym has quit IRC (Remote host closed the connection) [19:47] fuck me, i cannot figure out how to actually add books to it [19:48] copy pasta? [19:48] *** SN4T14_ has joined #archiveteam-bs [19:49] i dont wanna use fucking calibre :( [19:57] *** SN4T14 has quit IRC (Ping timeout: 512 seconds) [19:58] so i found old UN real media streams [19:59] old i mean 2010 i think [19:59] *** useretail has joined #archiveteam-bs [20:01] i just had to replug usb [20:01] its simply mass storage now [20:01] also good news is i think i'm downloading it alot faster then the korea stuff [20:02] *** mistym has joined #archiveteam-bs [20:15] so i'm getting 10 to 15 min videos in under a 60 seconds [20:15] from the UN [20:17] *** Smiley has joined #archiveteam-bs [20:34] *** rolf has quit IRC (Leaving...) [20:35] https://www.youtube.com/watch?v=n4Bcl1EeenM [21:10] *** schbirid has quit IRC (Leaving) [21:23] *** Smiley has quit IRC (HE'S BACCCCCK) [21:24] *** Smiley has joined #archiveteam-bs [21:35] *** jk[SVP] has quit IRC (Ping timeout: 240 seconds) [21:36] *** jk[SVP] has joined #archiveteam-bs [21:38] *** NotGLaDOS has quit IRC (Ping timeout: 240 seconds) [21:38] *** twrist has joined #archiveteam-bs [21:57] *** BlueMaxim has joined #archiveteam-bs [22:21] *** wtron has joined #archiveteam-bs [23:04] *** wtron has left