#archiveteam-bs 2017-08-24,Thu

↑back Search

Time	Nickname	Message
00:26 ^🔗		Geekonoci has joined #archiveteam-bs
00:34 ^🔗		Geekonoci has quit IRC (Quit: Page closed)
00:46 ^🔗	hook54321	How should I organize that? A table? or just a plain list?
00:48 ^🔗	JAA	What information do you have?
00:49 ^🔗	JAA	If it's really just a list of account names plus maybe some comments/notes, then I'd say plain list (alphabetically sorted, I guess). But if you have additional data for many of the accounts, a table might be better.
00:58 ^🔗		BlueMaxim has joined #archiveteam-bs
01:02 ^🔗	hook54321	A lot of the time I discover it because people ask for it to be excluded, publicly on twitter...
01:11 ^🔗	hook54321	balrog, JAA
01:14 ^🔗		qw3rty116 has joined #archiveteam-bs
01:17 ^🔗		qw3rty115 has quit IRC (Read error: Operation timed out)
01:30 ^🔗	godane	i'm at 1083k items
01:31 ^🔗	arkiver	hook54321: nice
01:33 ^🔗	hook54321	arkiver: I'm supposed to work with you on OpenNIC stuff
01:33 ^🔗	arkiver	I just read that yeah :P
01:34 ^🔗	Fusl	o/
01:34 ^🔗		Aranje has quit IRC (Remote host closed the connection)
01:34 ^🔗	Fusl	hook54321: thanks for taking care of and getting AT involved in this
01:34 ^🔗	hook54321	np
01:37 ^🔗	hook54321	I'm not sure what we're going to do about the .free tld. At this point it makes the most sense to just grab the .libre sites, since the .free sites were moved over there. However, I'm not sure if we should be worried about other webpages still using .free URLs.
01:38 ^🔗	hook54321	I also have no idea what we'll do if ICANN decides to create a TLD that's already an OpenNIC tld.
01:38 ^🔗	hook54321	again
01:39 ^🔗	Frogging	There's nothing to do about it. The DNS has no provisions for conflicting roots
01:39 ^🔗	Frogging	(does it?)
01:40 ^🔗	hook54321	In the case of .free, OpenNIC is basically just moving all of the .free domains over to .libre
01:42 ^🔗	hook54321	Personally, I'm hoping that's what they're planning to do if this happens again.
01:42 ^🔗	Frogging	what else can be done?
01:44 ^🔗	hook54321	This is their position on it:
01:44 ^🔗	Somebody2	a subdomain? .free.opennic?
01:44 ^🔗	hook54321	"What is OpenNIC's relationship with the other alternative roots and ICANN?
01:44 ^🔗	hook54321	OpenNIC currently recognizes and peers all of the existing ICANN TLDs (.com, .uk, etc.). Therefore, if you configure your computer to resolve OpenNIC domains, you'll also be able to resolve all of the ICANN TLDs automatically.
01:44 ^🔗	hook54321	OpenNIC has not yet evaluated nor does it hold a formal position on the current/future ICANN TLDs."
01:45 ^🔗	hook54321	Somebody2: I don't think they would like that
01:45 ^🔗	Somebody2	hook54321: who?
01:45 ^🔗	Somebody2	ICANN or OpenNIC, or someone else?
01:45 ^🔗	hook54321	OpenNIC
01:46 ^🔗	Somebody2	why not? It would make it clear the source of the domains...?
01:46 ^🔗	joepie91	today in "wtf Google": https://twitter.com/joepie91/status/900534232296161284
01:46 ^🔗	Somebody2	they could even do the same with IACANN domains: .com.icann
01:46 ^🔗	hook54321	That would break ssl certificates though
01:47 ^🔗	Frogging	joepie91: http://i.imgur.com/By95Lva.jpg
01:47 ^🔗	hook54321	and it kinda defeats the purpose of what they're trying to do
01:47 ^🔗	joepie91	lol
01:56 ^🔗	Somebody2	hook54321: between converting domains previously at .free into domains at .libre vs converting them into domains at .free.opennic -- I'm not sure why .libre is better... can you clarify?
01:57 ^🔗	Somebody2	and how does adding .opennic or .icann at the end of domains defeat what they are trying to do?
01:59 ^🔗	hook54321	because then it's a subdomain
02:00 ^🔗	Frogging	example.com is a subdomain of .com
02:00 ^🔗	hook54321	.com is the TLD though
02:00 ^🔗	Frogging	DNS doesn't distinguish
02:01 ^🔗	Frogging	http://com/ is valid
02:01 ^🔗	Somebody2	DNS doesn't, but various protocols on top do in various subtle ways
02:01 ^🔗	Frogging	and as a side note, I am now immensely confused at the result of visiting that URL
02:01 ^🔗	Somebody2	like various browsers do different things depending on whether a DNS segment is a toplevel or not
02:02 ^🔗	Frogging	oh I see. someone has a sense of humour. ".XYZ is the next .COM. .XYZ is the #1 new domain in the world"
02:03 ^🔗	Somebody2	ok, so rather than .free.opennic, you could do performance horrible things by using a different delimiter, e.g. .free-opennic, or even .free;opennic
02:05 ^🔗	Somebody2	Frogging: oddly, neither dig nor curl are able to follow that redirect.
02:08 ^🔗	Frogging	indeed, I think that one does not actually work, it was my browser adding .com to it automatically
02:08 ^🔗	Somebody2	Ah, this is because the browser automatically converts "http://com/" into a request for http://www.com.com/
02:09 ^🔗	Frogging	try this one http://dk/
02:10 ^🔗	Somebody2	yes, that one works in dig and curl
02:10 ^🔗	Somebody2	301 redirecting to https://www.dk-hostmaster.dk/
02:10 ^🔗	Frogging	yup
02:34 ^🔗	Frogging	I was thinking about that line ".XYZ is the next .COM" and it occurred to me that I've never in my life seen a legitimate .xyz website
02:34 ^🔗	Frogging	so I went to one of the ones linked on the page
02:34 ^🔗	Frogging	https://www.goinnovate.xyz/
02:35 ^🔗	Frogging	this is... frightful.
02:35 ^🔗	Frogging	"fog computing"
02:36 ^🔗	Frogging	http://www.exponentials.xyz/posts/the-roles-of-cloud-computing-and-fog-computing-in-the-internet-of-things-revolut-6205388
02:53 ^🔗		drumstick has quit IRC (Ping timeout: 268 seconds)
03:00 ^🔗		drumstick has joined #archiveteam-bs
03:14 ^🔗		qw3rty117 has joined #archiveteam-bs
03:18 ^🔗	hook54321	I'm surprised that IA actually complied with this guy's request: https://twitter.com/darthodius/status/658731881626783745
03:20 ^🔗		qw3rty116 has quit IRC (Read error: Operation timed out)
03:41 ^🔗		Fletcher has quit IRC (Remote host closed the connection)
03:56 ^🔗		Fletcher has joined #archiveteam-bs
04:15 ^🔗		Sk1d has quit IRC (Ping timeout: 250 seconds)
04:16 ^🔗	hook54321	public.resource.org refers to the Internet Archive building as "the Church of the Internet Archive"
04:20 ^🔗	Asparagir	Well, the bulding really was an old church once.
04:20 ^🔗	Asparagir	*building
04:20 ^🔗	Asparagir	It still has pews and all that.
04:22 ^🔗		Sk1d has joined #archiveteam-bs
04:38 ^🔗	hook54321	Asparagir: yeah, i know.
04:39 ^🔗	hook54321	I wonder if anyone has ever gone there in person and angerly demanded their site get removed from the wayback machine
04:58 ^🔗		Asparagir has quit IRC (Asparagir)
05:38 ^🔗	jrwr	this guy is my history hero https://www.youtube.com/watch?v=ZqUm1YXTxNc
05:38 ^🔗	jrwr	Steve1989MREInfo
06:12 ^🔗		Honno has joined #archiveteam-bs
06:15 ^🔗		RichardG has quit IRC (Read error: Connection reset by peer)
06:17 ^🔗		RichardG has joined #archiveteam-bs
06:31 ^🔗		soja92 has joined #archiveteam-bs
07:08 ^🔗		kristian_ has joined #archiveteam-bs
07:44 ^🔗		drumstick has quit IRC (Read error: Operation timed out)
07:44 ^🔗		drumstick has joined #archiveteam-bs
08:07 ^🔗		tuluu has quit IRC (Ping timeout: 245 seconds)
08:17 ^🔗		tuluu has joined #archiveteam-bs
08:52 ^🔗		Boppen has quit IRC (Ping timeout: 194 seconds)
08:53 ^🔗		drumstick has quit IRC (Ping timeout: 268 seconds)
08:54 ^🔗		kristian_ has quit IRC (Read error: Operation timed out)
09:00 ^🔗		etudier has joined #archiveteam-bs
09:05 ^🔗		drumstick has joined #archiveteam-bs
09:15 ^🔗		Boppen has joined #archiveteam-bs
09:40 ^🔗		dashcloud has quit IRC (Read error: Connection reset by peer)
09:40 ^🔗		dashcloud has joined #archiveteam-bs
11:11 ^🔗		BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:26 ^🔗		drumstick has quit IRC (Read error: Operation timed out)
12:15 ^🔗		brayden has quit IRC (Read error: Connection reset by peer)
12:17 ^🔗		brayden has joined #archiveteam-bs
12:17 ^🔗		swebb sets mode: +o brayden
12:49 ^🔗		kurt has quit IRC (Read error: Operation timed out)
13:44 ^🔗	SketchCow	The Archive has had to deal with a lot of crazy walking right in and demanding things, yes.
13:44 ^🔗	SketchCow	Two levels: The one you think of, someone coming in and demanding something related to content.
13:44 ^🔗	SketchCow	The other: Since it looks like a church, church-related things, like homeless or people asking for sanctuary, etc.
13:44 ^🔗	SketchCow	There are between 3-5 people who sleep at the Archive at night, since it looks like a church
13:45 ^🔗	SketchCow	There was one who was across the street for years, Thomas. He died last October, and even the night he died, we'd brought over extra food from the celebration going on, which he had.
13:47 ^🔗	SketchCow	http://richmondsfblog.com/2016/10/27/thomas-resident-homeless-man-at-funston-clement-passed-away-wednesday-night/
13:54 ^🔗		BartoCH has quit IRC (Quit: WeeChat 1.9)
14:08 ^🔗		vitzli has joined #archiveteam-bs
14:21 ^🔗		TheLovina has quit IRC (Quit: Leaving)
14:30 ^🔗	closure_	til
14:31 ^🔗		Mateon1 has quit IRC (Read error: Operation timed out)
14:31 ^🔗		Mateon1 has joined #archiveteam-bs
14:36 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
14:36 ^🔗		REiN^ has joined #archiveteam-bs
14:36 ^🔗	JAA	hook54321: That's for pages which are explicitly blocked due to a direct request to IA, I assume?
14:37 ^🔗	hook54321	I'm not exactly sure yet.
14:38 ^🔗	hook54321	Whatever we make it out to be I guess
14:38 ^🔗	hook54321	Maybe both, idk
14:39 ^🔗	JAA	I just think that a list of pages blocked by robots.txt is probably massive.
14:40 ^🔗	hook54321	yeah. on the other hand, it could be used for us to have some sort of database of what IA might not be crawling.
14:40 ^🔗	hook54321	*could be useful\
16:48 ^🔗		vitzli has quit IRC (Quit: Leaving)
17:25 ^🔗	godane	another page is missing for 01-01-2014
17:26 ^🔗	godane	*2014-01-01
17:26 ^🔗	godane	anyways is page 24
17:35 ^🔗		Asparagir has joined #archiveteam-bs
18:05 ^🔗		namespace has joined #archiveteam-bs
18:06 ^🔗	namespace	So I'm trying to OCR/type up/something this PDF so it can be put on a public website as an actually readable/searchable/etc historical archive: https://archive.org/details/8BBSArchiveP1V1
18:06 ^🔗	namespace	(Also to generate more interest in it through sex appeal, because it's actually a fairy important bit of phreaker history.)
18:07 ^🔗	namespace	And it is just the nastiest PDF out there (blurry monospace font, holepunched so that sometimes entire bits of text are missing, scuff marks, etc).
18:08 ^🔗	namespace	The automatic archive.org OCR is actually better than what I could get using OCRFeeder: https://archive.org/stream/8BBSArchiveP1V1/8BBS_Archive_P1V1_djvu.txt
18:08 ^🔗	namespace	Is there any way to do better, or?
18:08 ^🔗		schbirid has joined #archiveteam-bs
18:29 ^🔗	astrid	wow that is nasty yeah
18:29 ^🔗	astrid	i have a very flakey ocr program that works well on single-font fixed-width material
18:29 ^🔗	astrid	https://github.com/chronomex/ess-ocr if you want to poke at it
18:30 ^🔗	astrid	all the magic bits are hardcoded, so youd have to hardcode new constants to describe the pages you want to ocr
18:30 ^🔗	astrid	and that sort of thing
18:34 ^🔗	namespace	Yeah here's the strategies I've thought up so far:
18:35 ^🔗	namespace	- Retype the whole thing (would prefer not to, takes TON of time, solitary)
18:35 ^🔗	astrid	i mean, this is the sort of thing that i wrote my ocr program for
18:35 ^🔗	astrid	this will ocr quite well
18:35 ^🔗	namespace	- Highlight all the actual posts and then extract them as images, put up on a wiki and let other people help.
18:35 ^🔗	namespace	- OCR it, which yeah I'll try that thing you just posted thanks.
18:35 ^🔗	astrid	it's extremely fragile but this is fixed width enough to work super well
18:36 ^🔗	astrid	i am planning to do a total rewrite of the tool to make it useful, but for now you will be able to get it to work
18:36 ^🔗	astrid	watch out, tool is designed to work with pages that have a black frame around them, so you might need to remove some code about that
18:36 ^🔗	astrid	(it used the black frame for fiducial registration)
18:37 ^🔗	namespace	Eheheh.
18:37 ^🔗	astrid	crop_to_rect is the function to comment out calls to
18:37 ^🔗	atrocity	where's the PDF you're trying to OCR?
18:37 ^🔗	astrid	https://archive.org/stream/8BBSArchiveP1V1/8BBS_Archive_P1V1#page/n3/mode/1up
18:39 ^🔗	atrocity	i have pdf ocr software at work, i'll try running it on it, lol
18:41 ^🔗	atrocity	downloading at like 300KiB/s, ugh
18:41 ^🔗	namespace	wow lol
18:44 ^🔗	namespace	astrid: So stupid question, how do I compile this?
18:44 ^🔗	namespace	There's two files, no makefile, do I just compile them both separately and then execute one?
18:46 ^🔗	astrid	they both have a comment at the beginning saying the magic compiler spells to cast
18:46 ^🔗	astrid	deskew comes from the leptonica library
18:46 ^🔗	namespace	Ah, k.
18:46 ^🔗	namespace	Thanks.
18:46 ^🔗	astrid	so you might need to set that up
18:46 ^🔗	astrid	also you need to feed it 'pgm' images
18:47 ^🔗	namespace	Any dependencies? ^^;;
18:47 ^🔗	astrid	ess-ocr needs you to create a directory 'training' or it'll core
18:47 ^🔗	astrid	yeah libnetpnm among others
18:47 ^🔗	astrid	as i said
18:47 ^🔗	astrid	very rough :)
18:49 ^🔗	namespace	Yeah I'm smelling a goose chase, thanks anyway. :p
18:49 ^🔗	astrid	once you get it running, call it over one of your images
18:49 ^🔗	namespace	Yes see that first bit.
18:49 ^🔗	namespace	Is the bit that is very unlikely to happen.
18:50 ^🔗	astrid	it'll write out 'crop.pgm' which should be cropped to include the content and include the grid it's using for segmentation
18:50 ^🔗	astrid	oh :(
18:50 ^🔗	astrid	ok
18:50 ^🔗	namespace	This needs like, a readme.txt
18:50 ^🔗	astrid	well my plan for it involves a rewrite, because proper monospace ocr is a thing the world needs
18:50 ^🔗	astrid	desperately
18:50 ^🔗	namespace	Otherwise I'll just be asking you questions all day.
18:50 ^🔗	astrid	as we all know
18:50 ^🔗	namespace	Yes, yes it does.
18:50 ^🔗	astrid	yeah :\|
18:50 ^🔗	namespace	Here's my advice.
18:50 ^🔗	namespace	If you want to make this usable to others.
18:50 ^🔗	namespace	Make a debian/ubuntu/etc vm.
18:51 ^🔗	namespace	And set the thing up from scratch, and write down each step as you do.
18:51 ^🔗	namespace	That's your readme.txt
18:51 ^🔗	astrid	it's not currently intended to be useful, it's a quick hack
18:51 ^🔗	astrid	yeah
18:51 ^🔗	namespace	It's how I do all my setup readme files and they always come out excellent, whereas people who just write them from memory always end up skipping steps and stuff.
18:51 ^🔗	astrid	"if you want to do monospace ocr, here's a very fiddly thing that gets good results"
18:52 ^🔗	astrid	i usually wind up with 2 or 3 misrecognized characters per page at most
18:52 ^🔗	namespace	Yeah that would be incredible.
18:52 ^🔗	astrid	it doesn't assume it knows what letters look like
18:52 ^🔗	astrid	instead it's like "hm, idk what this is, hey user, what is it?" and you say "that's a B, and you can use it as an example of other Bs"
18:53 ^🔗	astrid	so it builds training data as you go
18:53 ^🔗	astrid	but i'm planning a rewrite with maintainability and also smartness
18:53 ^🔗	astrid	next version will lay down a grid over the page that can be skewed and bent, so that photographs of monospace text can be recognized f.ex
18:54 ^🔗	astrid	and it'll be much much less manual fiddly process
18:54 ^🔗	namespace	Yes well, the next version is in the indeterminate future and I'd like this to be on the net now. :P
18:54 ^🔗	astrid	because i have about 500 pages of photographs of typewritten text that i'd like to get into readable form
18:54 ^🔗	astrid	yes i know :
18:54 ^🔗	astrid	:)
18:58 ^🔗	atrocity	i have my work OCR running, it's going VERY slowly, lol
18:59 ^🔗	namespace	Unsurprising. XD
19:01 ^🔗	atrocity	i feel like i should've just ran it on the first 10 pages or soemthing instead of all 700, lol
19:02 ^🔗	godane	so now page 36 and 37 of 2014-04-30 of east bay express is giving me 404
19:02 ^🔗		HarryCros has quit IRC (Read error: Connection reset by peer)
19:03 ^🔗		HarryCros has joined #archiveteam-bs
19:06 ^🔗	godane	ok looks there was a middle booklet for 'Bike to work day'
19:06 ^🔗	godane	on may 8 that year
19:06 ^🔗	godane	after that in continues from page 28
19:07 ^🔗	godane	so its really page 8 and 9 of the 'bike to work day' booklet
19:09 ^🔗		HarryCros has quit IRC (Remote host closed the connection)
19:09 ^🔗		HarryCros has joined #archiveteam-bs
19:10 ^🔗	godane	so there pages 31 to 33 missing from 2014-06-04
19:11 ^🔗	atrocity	http://meddl.com/temp/ocrtest.txt
19:11 ^🔗	atrocity	that's what my work OCR app did, lol
19:11 ^🔗	atrocity	the first 10 pages at least
19:11 ^🔗		HarryCros has quit IRC (Remote host closed the connection)
19:12 ^🔗	namespace	Not that bad, tbh. But only of about comparable quality to the archive.org OCR scan.
19:13 ^🔗	atrocity	yeah, lol
19:13 ^🔗	atrocity	doesn't help that there's holes all through it
19:13 ^🔗	namespace	It really really doesn't.
19:13 ^🔗	namespace	Like I said, nasty.
19:15 ^🔗	atrocity	lol
19:26 ^🔗		C4K3_ has quit IRC (leaving)
19:30 ^🔗		C4K3 has joined #archiveteam-bs
19:36 ^🔗		kristian_ has joined #archiveteam-bs
19:40 ^🔗		odemg has quit IRC (Read error: Operation timed out)
19:55 ^🔗		odemg has joined #archiveteam-bs
20:41 ^🔗		bitBaron has joined #archiveteam-bs
20:41 ^🔗		bitBaron has quit IRC (Client Quit)
20:48 ^🔗	schbirid	yeah, why not move the "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD" bs to a specific channel?
20:48 ^🔗	schbirid	if the main channel is too holy to even discuss itself
20:51 ^🔗	Sanqui	I would not give password to this person >> <coltaine_> and I give you sucky sucky
20:52 ^🔗	schbirid	Sanqui Sanqui?
21:02 ^🔗	atrocity	$5
21:07 ^🔗	astrid	i'm with Sanqui here
21:09 ^🔗		Honno has quit IRC (Read error: Operation timed out)
21:12 ^🔗		HarryCros has joined #archiveteam-bs
21:20 ^🔗		Pudsey has joined #archiveteam-bs
21:21 ^🔗		HarryCros has quit IRC (Remote host closed the connection)
21:21 ^🔗		kristian_ has quit IRC (Ping timeout: 370 seconds)
21:24 ^🔗		HarryCros has joined #archiveteam-bs
21:25 ^🔗		HarryCros has quit IRC (Read error: Connection reset by peer)
21:25 ^🔗		HarryCros has joined #archiveteam-bs
21:38 ^🔗		Pudsey has quit IRC (Remote host closed the connection)
21:40 ^🔗		ZexaronS has joined #archiveteam-bs
21:42 ^🔗		kristian_ has joined #archiveteam-bs
21:50 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
21:51 ^🔗		REiN^ has joined #archiveteam-bs
21:51 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
21:52 ^🔗		REiN^ has joined #archiveteam-bs
21:55 ^🔗		drumstick has joined #archiveteam-bs
21:56 ^🔗	joepie91	I'm not sure that this is what was meant with "steal from work": https://twitter.com/SeamusHughes/status/900790149017219073
21:56 ^🔗	joepie91	:p
21:59 ^🔗	godane	i guy put this up on r/opendirectories : http://95.211.186.214/Incoming/
22:00 ^🔗	joepie91	ohh, lots of oldish stuff there
22:00 ^🔗	joepie91	ha
22:00 ^🔗	joepie91	yeah I'm pretty sure this guy went to... 33C3?
22:00 ^🔗	godane	i can't grab it but some one here would want it
22:00 ^🔗	joepie91	fairly certain that the Leaks directory came off one of the FTPs there
22:01 ^🔗	joepie91	godane: definitely, thanks :P
22:01 ^🔗	godane	i got from here: https://www.reddit.com/r/opendirectories/comments/6vri47/large_dj_sets_directory/
22:01 ^🔗	godane	i know that SketchCow is looking for older dj sets
22:02 ^🔗	joepie91	godane: oh, is he? any particular type?
22:02 ^🔗	joepie91	I may have a pile of my own
22:02 ^🔗	joepie91	that is, sets from a specific internet radio channel and some of their live events
22:02 ^🔗	joepie91	(afterhoursdjs.org, but it's stuff that isn't yet in the collection on IA)
22:02 ^🔗	godane	he is doing the hip hop mixtapes collection
22:03 ^🔗	joepie91	ah yeah, this is def not hip hop :p
22:03 ^🔗	godane	he will take it
22:05 ^🔗	joepie91	this is what I have laying around: https://gist.githubusercontent.com/joepie91/1afb987f86a2b417c33a61a35a6c0f29/raw/84a200d991cf846e79133b78e046558b8104a822/gistfile1.txt (cc SketchCow - let me know if you want me to ship them to FOS or such)
22:05 ^🔗	joepie91	haven't gotten around to sorting out the metadata yet
22:06 ^🔗	joepie91	goes back to 2002 in some places :P
22:28 ^🔗	godane	uploaded : https://archive.org/details/forum.kingsnake.com-1997-to-2003-archives-20161203:
22:28 ^🔗	godane	*uploaded : https://archive.org/details/forum.kingsnake.com-1997-to-2003-archives-20161203
22:52 ^🔗		Stiletto has quit IRC (Read error: Operation timed out)
23:28 ^🔗		kristian_ has quit IRC (Quit: Leaving)
23:38 ^🔗	SketchCow	That DJ directory looks like crap.
23:40 ^🔗		BlueMaxim has joined #archiveteam-bs
23:52 ^🔗		Stilett0 has joined #archiveteam-bs
23:59 ^🔗		TC01 has quit IRC (Remote host closed the connection)

irclogger-viewer